Bayesian Dimension Reduction and Prediction with Multiple Datasets
2023-06
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Bayesian Dimension Reduction and Prediction with Multiple Datasets
Alternative title
Authors
Published Date
2023-06
Publisher
Type
Thesis or Dissertation
Abstract
Biomedical investigators are increasingly able to collect multiple sources of omics data in pursuit of the understanding of disease pathogenesis. Integrative factorization methods for multi-omic datasets have been developed to reveal latent biological patterns driving variation among the observations. However, few methods can accommodate prediction for clinical or biological outcomes within datasets having this complex structure. In Chapter 2, we propose a framework for dimension reduction and prediction in the context of multi-omic, multi-cohort (bidimensional) datasets. We also extend the oft-used Bayesian variable selection approach, the spike-and-slab prior, to accommodate hierarchical variable selection across multiple regression models. We applied this framework to multi-omic data from the Cancer Genome Atlas to predict overall survival across disparate cancer types. We identified multi-omic biological patterns related to survival that persist across multiple cancers. In Chapter 3, we proposed a Bayesian framework to perform either integrative factorization or simultaneous factorization and prediction, which we term Bayesian Simultaneous Factorization and Prediction (BSFP). BSFP concurrently estimates latent factors driving variation within and across omics datasets while estimating their effects on an outcome, providing a complete framework for uncertainty. We show via simulation the importance of accounting for uncertainty in the estimated factorization within the predictive model and the flexibility of this framework for multiple imputation. We also apply BSFP to metabolomic and proteomic data to predict lung function decline among individuals living with HIV. Finally, in Chapter 4, we extend the framework described in Chapter 3 to accommodate simultaneous factorization and prediction using bidimensional data, i.e. across multiple omics sources and multiple sample cohorts, which we term multi-cohort BSFP, or MCBSFP. We evaluate the performance of this framework in recovering latent variation structures via simulation and we use this model to reanalyze the proteomic and metabolomic data from the study considered in Chapter 3.
Description
University of Minnesota Ph.D. dissertation. June 2023. Major: Biostatistics. Advisor: Eric Lock. 1 computer file (PDF); x, 134 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Samorodnitsky, Sarah. (2023). Bayesian Dimension Reduction and Prediction with Multiple Datasets. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/258655.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.