Zhang, Huanan2017-10-092017-10-092017-05https://hdl.handle.net/11299/190520University of Minnesota Ph.D. dissertation. May 2017. Major: Computer Science. Advisor: Rui Kuang. 1 computer file (PDF); viii, 89 pages.Because of disease progression and heterogeneity in samples and single cells, biomarker detection among subgroups is important as it provides better understanding on population genetics and cancer causative. In this thesis, we proposed several structured latent features based and multitask learning based methods for biomarker detection on DNA Copy-Number Variations (CNVs) data and single cell RNA sequencing (scRNA-seq) data. By incorporating prior known group information or taking domain heterogeneity into consideration, our models are able to achieve meaningful biomarker detection and accurate sample classification. 1. By cooperating population relationship from human phylogenetic tree, we introduced a latent feature model to detect population-differentiation CNV markers. The algorithm, named tree-guided sparse group selection (treeSGS), detects sample sub- groups organized by a population phylogenetic tree such that the evolutionary relations among the populations are incorporated for more accurate detection of population- differentiation CNVs. 2. We applied transfer learning technic for cross-cancer-type CNV studies. We proposed Transfer Learning with Fused LASSO (TLFL) algorithm, which detects latent CNV components from multiple CNV datasets of different tumor types and distinguishes the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. 3. We further applied multitask learning idea on scRNA-seq data. We introduced variance-driven multitask clustering on single-cell RNA-seq data (scV DMC) that utilizes multiple cell populations from biological replicates or related samples with significant biological variances. scVDMC clusters single cells of similar cell types and markers but varies expression patterns across different domains such that the scRNA-seq data are adjusted for better integration. We applied both simulations and several publicly available CNV and scRNA-seq datasets, including one in house scRNA-seq dataset, to evaluate the performance of our models. The promising results show that we achieve better biomarker prediction among subgroups.enCopy Number VariationLatent featuresMachine LearningTransfer LearningDetecting Biomarkers among Subgroups with Structured Latent Features and Multitask Learning MethodsThesis or Dissertation