Network-based learning algorithms for understanding human disease

Hwang, Tae Hyun2011-05-182011-05-182011-03https://hdl.handle.net/11299/104581University of Minnesota Ph.D. dissertation. March 2011. Major: Computer science. Advisor: Rui Kuang, Ph.D. 1 computer file (PDF)xiv, 99 pages.Advances in genomics, proteomics and molecular pathology with the use of high-throughput technologies, have produced vast datasets identifying thousands of genes whose genomic changes differ in diseased versus normal samples. Many statistical and machine learning methods have been developed to discover biomarkers with potential clinical value, but building reliable learning models for the discovery of biomarkers for prediction of clinical outcomes using high-throughput dataset is still a key challenge in genomic research. This thesis introduces network-based learning algorithms to better utilize large-scale genomic data, and to integrate data with biological prior knowledge to understand the role of genetic changes in human diseases. The first method, NetProp (Network Propagation), is a graph-based semi-supervised feature classification algorithm to identify discriminative biomarkers by learning on bipartite graphs in the analysis of high dimensional genomic data. The second method, HyperPrior, is a hypergraph-based semi-supervised learning algorithm to integrate genomic data with the known biological prior knowledge for biomarker identification and patient's outcome prediction. The third method, MINProp, is a general graph-based learning algorithm to integrate multiple genomic and network data for disease gene discovery. While the method could be applied to discover candidate biomarkers in a high-throughput genomic study, validating the candidate biomarkers is another challenging problem in genomic research. To address this, we introduce a network-based method, rcNet (rank coherence in Network), to elucidate the associations between disease and genes. We applied these methods to large and various real datasets including microarray gene expression profiles, single nucleotide polymorphisms (SNPs), and DNA copy number variations. Our methods identified novel biomarkers with clinical or biological relevance with the disease, as well as achieved competitive classification performance compared with other baseline methods. Our method also successfully validated the associations between diseases and potential disease-causing genes discovered from high-throuput studies. The results indicate that the method that explore the global topological information in the networks, and integrate data with biological prior knowledge could help to discover genetic determinants of human disease, and reveal underlying biological principles of human disease.en-USBiomarkerCancer genomicsData integrationDisease-gene associationMolecular networkNetwork-based learning algorithmComputer ScienceNetwork-based learning algorithms for understanding human diseaseThesis or Dissertation