Samorodnitsky, Sarah2023-11-282023-11-282023-06https://hdl.handle.net/11299/258655University of Minnesota Ph.D. dissertation. June 2023. Major: Biostatistics. Advisor: Eric Lock. 1 computer file (PDF); x, 134 pages.Biomedical investigators are increasingly able to collect multiple sources of omics data in pursuit of the understanding of disease pathogenesis. Integrative factorization methods for multi-omic datasets have been developed to reveal latent biological patterns driving variation among the observations. However, few methods can accommodate prediction for clinical or biological outcomes within datasets having this complex structure. In Chapter 2, we propose a framework for dimension reduction and prediction in the context of multi-omic, multi-cohort (bidimensional) datasets. We also extend the oft-used Bayesian variable selection approach, the spike-and-slab prior, to accommodate hierarchical variable selection across multiple regression models. We applied this framework to multi-omic data from the Cancer Genome Atlas to predict overall survival across disparate cancer types. We identified multi-omic biological patterns related to survival that persist across multiple cancers. In Chapter 3, we proposed a Bayesian framework to perform either integrative factorization or simultaneous factorization and prediction, which we term Bayesian Simultaneous Factorization and Prediction (BSFP). BSFP concurrently estimates latent factors driving variation within and across omics datasets while estimating their effects on an outcome, providing a complete framework for uncertainty. We show via simulation the importance of accounting for uncertainty in the estimated factorization within the predictive model and the flexibility of this framework for multiple imputation. We also apply BSFP to metabolomic and proteomic data to predict lung function decline among individuals living with HIV. Finally, in Chapter 4, we extend the framework described in Chapter 3 to accommodate simultaneous factorization and prediction using bidimensional data, i.e. across multiple omics sources and multiple sample cohorts, which we term multi-cohort BSFP, or MCBSFP. We evaluate the performance of this framework in recovering latent variation structures via simulation and we use this model to reanalyze the proteomic and metabolomic data from the study considered in Chapter 3.enBayesian hierarchical modelingBidimensionally-linked matricesIntegrative FactorizationMulti-omicsSpike-and-slab priorsBayesian Dimension Reduction and Prediction with Multiple DatasetsThesis or Dissertation