Browsing by Subject "Multi-source"

Now showing 1 - 2 of 2

Bayesian Modeling of Multi-Source Multi-Way Data
(2023-11) Kim, Jonathan
Biomedical research often involves data collected from multiple sources and these sources often have a multi-way (i.e.. multidimensional tensor) structure. Existing methods that can accommodate multi-source or multi-way data have various limitations on the exact structure of the data they are able to accommodate and in the type of predictions, if any, they are able to produce. Furthermore, few of these methods are able to handle data that are simultaneously multi-source and multi-way. We first introduce two such multi-source and multi-way datasets of molecular and hematological data from multiple sources, each measured over multiple developmental time points and in multiple tissues, as predictors of early-life iron deficiency (ID) in a rhesus monkey model. We describe preliminary analyses that were conducted on these datasets using existing methods. We then develop a Bayesian linear model that can perform prediction on a binary or continuous outcome and can accommodate data that are both multi-source and multi-way. We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for one of our motivating datasets. Finally, we propose a flexible method called Bayesian regression on numerous tensors (BRONTe) that can predict a continuous or binary outcome from data that are collected from an arbitrary number of sources with multi-way tensor structures of arbitrary, not necessarily equal, orders. Additionally, BRONTe is able to accommodate data where some sources partially share features within a dimension. Simulations show BRONTe to perform well at prediction when the data sources are of unequal dimensions. In an application to our other motivating dataset on multi-way measures of metabolomics and hematology parameters, BRONTe was capable of robust classification of early-life iron deficiency.
Multi-source Data Decomposition and Prediction for Various Data Types
(2022-12) Palzer, Elise
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. In Chapter 2, we propose a method called supervised joint and individual variation explained (sJIVE) [1] that can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data, and an application to data from the COPDGene study reveals gene expression and proteomic patterns that are predictive of lung function. In Chapter 3, we extend sJIVE to allow for binary and/or count data and to incorporate sparsity using a method called sparse exponential family sJIVE (sesJIVE). Simulations show the non-sparse version of sesJIVE to outperform existing methods when the data is Bernoulli- or Poisson- distributed with large amounts of noise, and sesJIVE outperforms other JIVE-based methods in our application with COPDGene data. Lastly, chapter 4 will discuss our R package, sup.r.jive, that implements sJIVE, sesJIVE, and a previous method called JIVE-Predict [2]. Summary and visualization tools are also available within our R package for all three methods.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Multi-source"