Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation

A tremendous growth in data collection has been an important enabler of the recent upsurge in Machine Learning (ML) models. ML techniques involve processing, analyzing, and discovering patterns from real user generated data. These data are usually high-dimensional, sparse, incomplete, and, in many applications, are only available at coarse granularity. For instance, a location mode can be at a state-level rather than county, or a time mode can be on a monthly basis instead of weekly. These (dis)aggregation challenges in real world data raise some intriguing questions and bring some challenging tasks. Given coarse-granular/aggregated data (e.g., monthly summaries), can we recover the fine-granular data (e.g., the daily counts)? Aggregated data enjoy concise representations and thus can be stored and transferred efficiently, which is critical in the era of data deluge. On the other hand, recent ML models are data hungry and benefit from detailed data for personalized analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. In this thesis, we provide data disaggregation frameworks for one-dimensional time series data and multidimensional (tensor) data. The developed models recognize the structure of the data and exploit it to reduce the number of unknown parameters. In a related setting, multidimensional data are often partially observed, e.g., recommender systems data are usually extremely sparse as a user interacts with only a small subset of the available items. Can we reconstruct/complete the missing data? This question is central in many recommendation and more general prediction tasks in various applications such as healthcare, learning and business analytics. A major challenge stems from the fact that the number of unknown parameters is usually much larger than the number of observed samples, which has motivated using prior information. Imposing the appropriate regularization prior limits the solution search to the ‘right’ space. In addition to sparsity, high-dimensionality also creates the challenge of ‘hiding’ the underlying structures and causes that can explain the data. In order to tackle this ‘dimensionality curse’, many dimensionality reduction (DR) methods such as principal component analysis (PCA) have been proposed. The reduced dimension data usually yields better performance in downstream tasks, such as clustering. This suggests that the underlying structure (e.g., clustering) is more pronounced in some low-dimensional space compared to the original data domain. In this thesis, we present principled approaches that bridge incorporating prior information and DR techniques. We rely on low-rank (nonnegative) matrix factorization for DR and incorporate two different types of priors: i) hierarchical tree clustering, and ii) user-item embedding relationships. Imposing these regularization priors not only improves the quality of latent representations, but it also helps reveal more of the underlying structure in latent space. The tree prior provides a meaningful hierarchical clustering in an unsupervised data-driven fashion, while the user-item relationships underpin the latent factors and explain how the resulting recommendations are formed.

Description

University of Minnesota Ph.D. dissertation. August 2021. Major: Electrical/Computer Engineering. Advisor: Nicholas Sidiropoulos. 1 computer file (PDF); x, 121 pages.

Collections

Dissertations

Suggested citation

Almutairi, Faisal. (2021). Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/259757.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Latent Factorization for Hierarchical and Explainable Embeddings and Data Disaggregation

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation