Integrated Multi-Omics Approach to Predict Dementia: Using an Explainable Variational Autoencoder (E-Vae) Classifier Model

Vivek, Sithara2023-09-192023-09-192023-04https://hdl.handle.net/11299/257038University of Minnesota Ph.D. dissertation. April 2023. Major: Computer Science. Advisors: Bharat Thyagarajan, Weihua Guan. 1 computer file (PDF); xi, 97 pages.Alzheimer’s disease (AD) and AD-related dementias (ADRD) are complex multifactorial processes where epigenetic and biochemical changes occur many years before the onset of clinical symptoms. During the last decade, large amounts of high-throughput molecular data including genetic variants, and epigenetic and transcriptomic data from blood and brain tissues have improved our understanding of complex molecular mechanisms associated with pathways of AD/ADRD. The application of deep learning methods to analyze integrated multi-omics data may be a powerful approach to elucidate the biological mechanisms in AD. This dissertation aims to develop a framework to process high-dimensional genomics data and to integrate multi-omics data to classify dementia utilizing an end-to-end deep learning classifier model. We developed an end-to-end deep learning explainable variational autoencoder (E-VAE) classifier model, using genome-wide genetic variants (GWAS SNPs) with an accuracy = 0.71 and sensitivity = 0.73 (Chapter 2), and transcriptome (RNA-Seq) with an accuracy = 0.83 and sensitivity = 0.77 (Chapter 3) and epigenetic (DNA methylation) with an accuracy = 0.79 and sensitivity = 0.88 (Chapter 4) collected from 2700 study participants in the Health and Retirement Study (HRS). We utilized a framework to integrate genetic variants and RNAseq data and developed a multi-omics (GWAS SNPs + RNAseq) explainable variational autoencoder (E-VAE) classifier model to predict dementia (Chapter 5) with an accuracy = 0.73 and sensitivity = 0.73. We evaluated the generalizability of the E-VAE classifier models in an external dataset from Religious Orders Study/Memory and Aging Project (ROSMAP) and the multi-omics E-VAE classifier model achieved an accuracy = 0.67 and sensitivity = 0.77. We found that the integrated multi-omics E-VAE classifier model achieved better generalizability in the external data compared to a penalized logistic regression model (accuracy = 0.73 and sensitivity = 0.33) trained using GWAS SNPs and RNAseq. Utilizing the linear decoder in the E-VAE classifier model, we extracted biological interpretable latent features and translated the top-weighted genes into biological insights. We identified genes known to be involved in the pathogenesis of AD/ADRD and novel genes that were not studied previously in association with AD/ADRD. In summary, this dissertation demonstrates the utility of deep learning methods to analyze complex multi-omics data to classify AD/ADRD. The explainable deep learning model, allowed us to interpret the biological importance of deep representations of multi-omics features by optimizing a classifier model for dementia and generating new hypotheses to advance our understanding of the pathobiology of AD/ADRD.enIntegrated Multi-Omics Approach to Predict Dementia: Using an Explainable Variational Autoencoder (E-Vae) Classifier ModelThesis or Dissertation