Discovering combinatorial disease biomarkers

Fang, Gang2013-10-102013-10-102012-08https://hdl.handle.net/11299/157997University of Minnesota Ph.D. dissertation. August 2012. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); vii, 182 pages.Many diseases have a genetic component. Some, including many cancers, are caused by a change in the functioning of a gene or a group of genes in a person's cells. Disease-biomarker discovery seeks to find the association between diseases and a person's genetic or associated characteristics, such as genes, DNA mutations, methylations, non-coding RNAs, proteins, metabolic products, and biological pathways. These biomarkers, such as the mutations in the BRCA1 and BRCA2 genes that indicate a high risk of breast cancer, can help in understanding the mechanisms causing a disease, and can guide diagnosis, prognosis and treatment. With the recent availability of high-throughput "-omics" and next-generation sequencing data, biomarker discovery is shifting from hypothesis-driven analysis towards data-driven analysis, which enables the discovery of previously unsuspected genetic associations for a variety of diseases. However, for most diseases, there remains a substantial disparity between the disease risk explained by the discovered loci and the estimated total heritable disease risk based on familial aggregation, a problem that has been referred to as "missing heritability". While there are a number of possible explanations for missing heritability, genetic interactions between loci are one potential culprit. Genetic interactions generally refer to two or more genes whose contribution to a phenotype goes beyond the independent effects of the genes and are expected to play an important role in complex diseases. This thesis takes a data mining based approach, specifically discriminative pattern mining, and targets the computational discovery of combinatorial biomarkers associated with complex human diseases from a variety of large scale case control genomic datasets. It addresses several key challenges confronted by existing discriminative pattern mining algorithms: computational complexity, sample heterogeneity due to disease subtypes and lack of statistical power for most real datasets. It also proposes a novel concept to organize discriminative patterns into an interaction network that allows the discovery of high-level structural knowledge, in both global and local scales. Specifically, a general framework is proposed to detect pathway-pathway interaction pairs that are enriched for genetic level interactions from genome wide association datasets. Validations across independent real datasets not only demonstrate the reliability of the proposed framework but also lead to several interesting biological insights on several complex diseases such as breast cancer and Parkinson's disease. The data-mining algorithmic contributions in this thesis also hold promise for addressing generic challenges in other domains beyond biology.en-USCombinatorial searchData integrationDisease biomarkersDisease heterogeneityStatistical powerSystems biologyDiscovering combinatorial disease biomarkersThesis or Dissertation