Browsing by Subject "Supervised Learning"

Now showing 1 - 3 of 3

Computational Analysis of Churn in Multiplayer Online Games
(2015-05) Borbora, Zoheb
Churn refers to loss of customers and understanding churn behavior and being able to accurately predict likely churners is important for any business as it directly affects the customer base and thus revenue. Analysis of churn behavior is also important in terms of understanding factors of user engagement. As such, churn behavior has been studied across a wide range of industries such as telecom, banking and online social networks. However, most of existing churn research has focused on modeling individual churn behavior and the type of questions has also been limited by the types of datasets which are available to researchers. In this thesis, different aspects of churn in a Massively Multiplayer Online Role Playing Games (MMORPGs) are studied in depth. MMORPGs are persistent virtual environments that mimic complex physical spaces and many of the behaviors which are observed in the real world are also observed in MMORPGs. Millions of players interact in an online manner in these environments and the game logs capture player activities in great detail. We first use a behavior modeling approach to analyze the player's behavior leading up to the point of churn and discover key indicators or behavioral trends which can help identify players who are going to churn. We do an extensive evaluation and comparison of two types of churn - Cancellation of Subscription and Dormancy, using this approach. MMORPG environments are characterized by collaboration among players to achieve common goals in activities such as raids and group quests. We identify player communities which evolve over time in such game environments and extend the lifecycle -based approach to build models for predicting churn of these dynamically evolving communities. Models of player motivation seek to identify factors that motivate player behavior and can be helpful in analyzing and predicting churn behavior. We study the impact of different achievement and socialization-based player motivational factors on player churn. Specifically, we are interested in studying how socialization serves to increase player engagement and decrease churn. Contagion processes arise broadly in the social and biological sciences and can be seen in, for example, the spread of infectious diseases, the diffusion of innovations, dissemination of religious doctrine and information diffusion in online social networks. As per theories of social contagion, behavior and emotions can be transmitted between individuals in a population. We study the relationship between player churn and social contagion i.e when a player leaves a network, what is the impact on its immediate neighborhood. All of the existing churn research have focused on factors which lead to churn. We study the interpersonal effects which can cause spread of churn behavior in a network as well as the factors which keep a player in the network after his neighbor has churned.
Enhancing Machine Learning Classification for Electrical Time Series with Additional Domain Applications
(2019-11) Valovage, Mark
Recent advances in machine learning have significant, far-reaching potential in electrical time series applications. However, many methods cannot currently be implemented in real world applications due to multiple challenges. This thesis explores solutions to many of these challenges in an effort to realize the full potential of applying machine learning to dynamic electrical systems. This thesis focuses on two areas: electricity disaggregation and time series shapelets. However, the contributions below can be applied to dozens of other domains. Electricity disaggregation identifies individual appliances from one or more aggregate data streams. In first world countries, disaggregation has the potential to eliminate billions of dollars of waste each year, while in developing countries, disaggregation could reduce costs enough to help provide electricity to over a billion people who currently have no access to it. Existing disaggregation methods cannot be applied to real-world households because they are too sensitive to varying noise levels, require parameters to be tuned to individual houses or appliances, make incorrect assumptions about real-world data, or are too resource intensive for inexpensive hardware. This thesis details label correction, a process to automatically correct user-labeled training samples, to increase classification accuracy. It also details an approach to unsupervised learning that is scalable to hundreds of millions of buildings using two novel approaches: event detection without parameter tuning and iterative discovery without appliance models. Time series shapelets are small subsequences of time series used for classification of unlabeled time series. While shapelets can be used for electricity disaggregation, they have applications to dozens of other domains. However, little research has been done on the distance metric used by shapelets. This distance metric is critical, as it is the sole feature a shapelet uses to discriminate between samples from different classes. This thesis details two contributions to time series shapelets. The first, selective z-normalization, is a technique that increases the shapelet classification accuracy by discovering a combination of z-normalized and non-normalized shapelets. The second is computing shapelet-specific distances, a technique to increase accuracy by finding a unique distance metric for each shapelet.
Multi-source Data Decomposition and Prediction for Various Data Types
(2022-12) Palzer, Elise
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. In Chapter 2, we propose a method called supervised joint and individual variation explained (sJIVE) [1] that can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data, and an application to data from the COPDGene study reveals gene expression and proteomic patterns that are predictive of lung function. In Chapter 3, we extend sJIVE to allow for binary and/or count data and to incorporate sparsity using a method called sparse exponential family sJIVE (sesJIVE). Simulations show the non-sparse version of sesJIVE to outperform existing methods when the data is Bernoulli- or Poisson- distributed with large amounts of noise, and sesJIVE outperforms other JIVE-based methods in our application with COPDGene data. Lastly, chapter 4 will discuss our R package, sup.r.jive, that implements sJIVE, sesJIVE, and a previous method called JIVE-Predict [2]. Summary and visualization tools are also available within our R package for all three methods.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Supervised Learning"