Detecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning Methods
2017-05
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Detecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning Methods
Authors
Published Date
2017-05
Publisher
Type
Thesis or Dissertation
Abstract
Because of disease progression and heterogeneity in samples and single cells, biomarker detection among subgroups is important as it provides better understanding on population genetics and cancer causative. In this thesis, we proposed several structured latent features based and multitask learning based methods for biomarker detection on DNA Copy-Number Variations (CNVs) data and single cell RNA sequencing (scRNA-seq) data. By incorporating prior known group information or taking domain heterogeneity into consideration, our models are able to achieve meaningful biomarker detection and accurate sample classification. 1. By cooperating population relationship from human phylogenetic tree, we introduced a latent feature model to detect population-differentiation CNV markers. The algorithm, named tree-guided sparse group selection (treeSGS), detects sample sub- groups organized by a population phylogenetic tree such that the evolutionary relations among the populations are incorporated for more accurate detection of population- differentiation CNVs. 2. We applied transfer learning technic for cross-cancer-type CNV studies. We proposed Transfer Learning with Fused LASSO (TLFL) algorithm, which detects latent CNV components from multiple CNV datasets of different tumor types and distinguishes the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. 3. We further applied multitask learning idea on scRNA-seq data. We introduced variance-driven multitask clustering on single-cell RNA-seq data (scV DMC) that utilizes multiple cell populations from biological replicates or related samples with significant biological variances. scVDMC clusters single cells of similar cell types and markers but varies expression patterns across different domains such that the scRNA-seq data are adjusted for better integration. We applied both simulations and several publicly available CNV and scRNA-seq datasets, including one in house scRNA-seq dataset, to evaluate the performance of our models. The promising results show that we achieve better biomarker prediction among subgroups.
Description
University of Minnesota Ph.D. dissertation. May 2017. Major: Computer Science. Advisor: Rui Kuang. 1 computer file (PDF); viii, 89 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Zhang, Huanan. (2017). Detecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning Methods. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/190520.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.