Browsing by Author "Wang, Wen"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints(2010-06-21) Fang, Gang; Haznadar, Majda; Wang, Wen; Steinbach, Michael; Van Ness, Brian; Kumar, VipinThis paper has not yet been submitted.Item Characterizing Discriminative Patterns(2011-02-18) Fang, Gang; Wang, Wen; Oatley, Benjamin; NessVan, Brian; Steinbach, Michael; Kumar, VipinDiscriminative patterns are association patterns that occur with disproportionate frequency in some classes versus others, and have been studied under names such as emerging patterns and contrast sets. Such patterns have demonstrated considerable value for classification and subgroup discovery, but a detailed understanding of the types of interactions among items in a discriminative pattern is lacking. To address this issue, we propose to categorize discriminative patterns according to four types of item interaction: (i) driver-passenger, (ii) coherent, (iii) independent additive and (iv) synergistic beyond independent additive. The coherent, additive, and synergistic patterns are of practical importance, with the latter two representing a gain in the discriminative power of a pattern over its subsets. Synergistic patterns are most restrictive, but perhaps the most interesting since they capture a cooperative effect that is more than the sum of the effects of the individual items in the pattern. For domains such as biomedical and genetic research, differentiating among these types of patterns is critical since each yields very different biological interpretations. For general domains, the characterization provides a novel view of the nature of the discriminative patterns in a dataset, which yields insights beyond those provided by current approaches that focus mostly on pattern-based classification and subgroup discovery. This paper presents a comprehensive discussion that defines these four pattern types and investigates their properties and their relationship to one another. In addition, these ideas are explored for a variety of datasets (ten UCI datasets, one gene expression dataset and two genetic-variation datasets). The results demonstrate the existence, characteristics and statistical significance of the different types of patterns. They also illustrate how pattern characterization can provide novel insights into discriminative pattern mining and the discriminative structure of different datasets. Codes for pattern characterization and supplementary documents are available at http://vk.cs.umn.edu/CDPItem Construction and Functional Analysis of Human Genetic Interaction Networks with Genome-wide Association Data(2011-01-18) Fang, Gang; Wang, Wen; Paunic, Vanja; Oatley, Benjamin; Haznadar, Majda; Steinbach, Michael; Van Ness, Brian; Myers, Chad L.; Kumar, VipinMotivation: Genetic interaction measures how different genes collectively contribute to a phenotype, and can reveal functional compensation and buffering between pathways under genetic perturbations. Recently, genome-wide investigation for genetic interactions has revealed genetic interaction networks that provide novel insights both when analyzed independently and when integrated with other functional genomic datasets. For higher eukaryotes such as human, the above reverse-genetics approaches are not straightforward since the phenotypes of interest for higher eukaryotes such as disease onset or survival, are difficult to study in a cell based assay. Results: In this paper, we propose a general framework for constructing and analyzing human genetic interaction networks from genome-wide single nucleotide polymorphism (SNP) datasets used for case-control studies on complex diseases. Specifically, we propose a general approach with three major steps: (1) estimating SNP-SNP genetic interactions, (2) identifying linkage disequilibrium (LD) blocks and mapping SNP-SNP interactions to LD block-block interactions, and (3) functional mapping for LD blocks. We performed two sets of functional analyses for each of the six case-control SNP datasets used in the paper, and demonstrated that (i) genes in LD blocks showing similar interaction profiles tend to be functionally related, and (ii) the network can be used to discover pairs of compensatory gene modules (between-pathway models) in their joint association with a disease phenotype. The proposed framework should provide novel insights beyond existing approaches that either ignore interactions between SNPs or model different SNP-SNP pairs with genetic interactions separately. Furthermore, our study provides evidence that some of the core properties of genetic interaction networks based on reverse genetics in model organisms like yeast are also present in genetic interactions revealed by natural variation in human populations. Availability: Supplementary material http://vk.cs.umn.edu/humanGIItem Genetic Interactions and Complex Human Diseases(2017-08) Wang, WenGenetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, the global genetic networks mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. We examined BridGE approach with seven different diseases, and were able to discover significant interactions in six of them including Parkinson’s disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data. An application of BridGE with a focus on breast cancer was also extensively explored. We applied the BridGE method to six independent breast cancer cohorts and identified significant pathway-level interactions in five cohorts. Joint analysis across all five cohorts revealed a high confidence consensus set of genetic interactions with support in multiple cohorts. The discovered interactions implicated the glutathione conjugation, vitamin D receptor, purine metabolism, mitotic prometaphase, and steroid hormone biosynthesis pathways as major modifiers of breast cancer risk. Notably, while many of the pathways identified by BridGE show clear relevance to breast cancer, variants in these pathways had not been previously discovered by traditional single variant association tests or single pathway enrichment analyses that do not consider SNP-SNP interactions. Finally, we describe an application of the BridGE framework to test a specific hypothesis derived from studies of genetic interactions in yeast, which found that the proteasome complex was a genetic interaction hub. Given that proteasome function is highly conserved between yeast and humans, we predicted that natural variation in the homologous human proteasome genes would be involved in a number of disease-modifying genetic interactions. Using BridGE, we evaluated genetic interactions across seven different diseases, and indeed found that the proteasome pathway was the top positive interaction hub among ~800 pathways examined. Overall, this thesis demonstrates the potential for novel computational approaches to translate systems-level insights across species to better elucidate the genetic basis of human disease.Item Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data(2009-04-02) Fang, Gang; Pandey, Gaurav; Wang, Wen; Gupta, Manish; Steinbach, Michael; Kumar, VipinDiscriminative patterns can provide valuable insights into datasets with class labels, that may not be available from the individual features or predictive models built using them. Most existing approaches work efficiently for sparse or low-dimensional datasets. However, for dense and high-dimensional datasets, they have to use high thresholds to produce the complete results within limited time, and thus, may miss interesting low-support patterns. In this paper, we address the necessity of trading off the completeness of discriminative pattern discovery with the efficient discovery of low-support discriminative patterns from such datasets. We propose a family of anti-monotonic measures named SupMaxK that organize the set of discriminative patterns into nested layers of subsets, which are progressively more complete in their coverage, but require increasingly more computation. In particular, the member of SupMaxK with K = 2, named SupMaxPair, is suitable for dense and high-dimensional datasets. Several experiments on a cancer gene expression dataset demonstrate that there are low-support patterns that can be discovered using SupMaxPair, but not by existing approaches, and that these patterns are statistically significant and biologically relevant. This illustrates the complementarity of SupMaxPair to existing approaches for discriminative pattern discovery. The codes and dataset for this paper are available at http://vk.cs.umn.edu/SMP/.