Bayesian Dimension Reduction and Prediction with Multiple Datasets

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Bayesian Dimension Reduction and Prediction with Multiple Datasets

Published Date

2023-06

Publisher

Type

Thesis or Dissertation

Abstract

Biomedical investigators are increasingly able to collect multiple sources of omics data in pursuit of the understanding of disease pathogenesis. Integrative factorization methods for multi-omic datasets have been developed to reveal latent biological patterns driving variation among the observations. However, few methods can accommodate prediction for clinical or biological outcomes within datasets having this complex structure. In Chapter 2, we propose a framework for dimension reduction and prediction in the context of multi-omic, multi-cohort (bidimensional) datasets. We also extend the oft-used Bayesian variable selection approach, the spike-and-slab prior, to accommodate hierarchical variable selection across multiple regression models. We applied this framework to multi-omic data from the Cancer Genome Atlas to predict overall survival across disparate cancer types. We identified multi-omic biological patterns related to survival that persist across multiple cancers. In Chapter 3, we proposed a Bayesian framework to perform either integrative factorization or simultaneous factorization and prediction, which we term Bayesian Simultaneous Factorization and Prediction (BSFP). BSFP concurrently estimates latent factors driving variation within and across omics datasets while estimating their effects on an outcome, providing a complete framework for uncertainty. We show via simulation the importance of accounting for uncertainty in the estimated factorization within the predictive model and the flexibility of this framework for multiple imputation. We also apply BSFP to metabolomic and proteomic data to predict lung function decline among individuals living with HIV. Finally, in Chapter 4, we extend the framework described in Chapter 3 to accommodate simultaneous factorization and prediction using bidimensional data, i.e. across multiple omics sources and multiple sample cohorts, which we term multi-cohort BSFP, or MCBSFP. We evaluate the performance of this framework in recovering latent variation structures via simulation and we use this model to reanalyze the proteomic and metabolomic data from the study considered in Chapter 3.

Description

University of Minnesota Ph.D. dissertation. June 2023. Major: Biostatistics. Advisor: Eric Lock. 1 computer file (PDF); x, 134 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Samorodnitsky, Sarah. (2023). Bayesian Dimension Reduction and Prediction with Multiple Datasets. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/258655.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.