Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation
2021-08
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation
Alternative title
Authors
Published Date
2021-08
Publisher
Type
Thesis or Dissertation
Abstract
A tremendous growth in data collection has been an important enabler of the recent upsurge in Machine Learning (ML) models. ML techniques involve processing, analyzing, and discovering patterns from real user generated data. These data are usually high-dimensional, sparse, incomplete, and, in many applications, are only available at coarse granularity. For instance, a location mode can be at a state-level rather than county, or a time mode can be on a monthly basis instead of weekly. These (dis)aggregation challenges in real world data raise some intriguing questions and bring some challenging tasks. Given coarse-granular/aggregated data (e.g., monthly summaries), can we recover the fine-granular data (e.g., the daily counts)? Aggregated data enjoy concise representations and thus can be stored and transferred efficiently, which is critical in the era of data deluge. On the other hand, recent ML models are data hungry and benefit from detailed data for personalized analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. In this thesis, we provide data disaggregation frameworks for one-dimensional time series data and multidimensional (tensor) data. The developed models recognize the structure of the data and exploit it to reduce the number of unknown parameters. In a related setting, multidimensional data are often partially observed, e.g., recommender systems data are usually extremely sparse as a user interacts with only a small subset of the available items. Can we reconstruct/complete the missing data? This question is central in many recommendation and more general prediction tasks in various applications such as healthcare, learning and business analytics. A major challenge stems from the fact that the number of unknown parameters is usually much larger than the number of observed samples, which has motivated using prior information. Imposing the appropriate regularization prior limits the solution search to the ‘right’ space. In addition to sparsity, high-dimensionality also creates the challenge of ‘hiding’ the underlying structures and causes that can explain the data. In order to tackle this ‘dimensionality curse’, many dimensionality reduction (DR) methods such as principal component analysis (PCA) have been proposed. The reduced dimension data usually yields better performance in downstream tasks, such as clustering. This suggests that the underlying structure (e.g., clustering) is more pronounced in some low-dimensional space compared to the original data domain. In this thesis, we present principled approaches that bridge incorporating prior information and DR techniques. We rely on low-rank (nonnegative) matrix factorization for DR and incorporate two different types of priors: i) hierarchical tree clustering, and ii) user-item embedding relationships. Imposing these regularization priors not only improves the quality of latent representations, but it also helps reveal more of the underlying structure in latent space. The tree prior provides a meaningful hierarchical clustering in an unsupervised data-driven fashion, while the user-item relationships underpin the latent factors and explain how the resulting recommendations are formed.
Keywords
Description
University of Minnesota Ph.D. dissertation. August 2021. Major: Electrical/Computer Engineering. Advisor: Nicholas Sidiropoulos. 1 computer file (PDF); x, 121 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Almutairi, Faisal. (2021). Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/259757.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.