Browsing by Author "Hwang, TaeHyun"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Identifying Clinical and Genetic Markers of Human Disease by Classifying Features on Graphs(2007-09-26) Hwang, TaeHyun; Sicotte, Hugues; Wigle, Dennis; Kocher, Jean-Pierre; Kumar, Vipin; Kuang, RuiIdentification of clinical and genetic markers of disease can provide crucial information for both disease treatment and etiology. This complex task involves associating high-dimensional patterns such as largescale gene expressions and single nucleotide polymorphisms (SNPs) with disease-related phenotypes using very few samples. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative patterns by learning on bipartite graphs built from clinical variables, gene expressions and SNPs. Instead of performing feature selection or unsupervised bi-clustering, our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation, which captures the interactions between both samples and features (clinical and genetic variables) by exploring the global structure of the graph. Although globally optimized for classifying the features, our algorithm can also simultaneously classify the test samples for disease prognosis/diagnosis. We apply our algorithm to studying the Rosetta breast cancer dataset and chronic fatigue syndrome on a CAMDA contest dataset. Our algorithm identifies interesting clinical and genetic markers, some of which are consistent with previous studies in the literature, and achieves better overall classification performance than support vector machines and Bayesian networks. (Supplemental website: http://compbio.cs.umn.edu/Feature_Class/.)Item Inferring Disease and Gene Set Associations with Rank Coherence in Networks(2011-01-18) Hwang, TaeHyun; Zhang, Wei; Xie, MaoQiang; Kuang, RuiA computational challenge to validate the candidate disease genes identi?ed in a high-throughput genomic study is to elucidate the associations between the set of candidate genes and disease phenotypes. The conventional gene set enrichment analysis often fails to reveal associations between disease phenotypes and the gene sets with a short list of poorly annotated genes, because the existing annotations of disease causative genes are incomplete. We propose a network-based computational approach called rcNet to discover the associations between gene sets and disease phenotypes. Assuming coherent associations between the genes ranked by their relevance to the query gene set, and the disease phenotypes ranked by their relevance to the hidden target disease phenotypes of the query gene set, we formulate a learning framework maximizing the rank coherence with respect to the known disease phenotype-gene associations. An e?cient algorithm coupling ridge regression with label propagation, and two variants are introduced to ?nd the optimal solution of the framework. We evaluated the rcNet algorithms and existing baseline methods with both leave-one-out cross-validation and a task of predicting recently discovered disease-gene associations in OMIM. The experiments demonstrated that the rcNet algorithms achieved the best overall rankings compared to the baselines. To further validate the reproducibility of the performance, we applied the algorithms to identify the target diseases of novel candidate disease genes obtained from recent studies of GWAS, DNA copy number variation analysis, and gene expression pro?ling. The algorithms ranked the target disease of the candidate genes at the top of the rank list in many cases across all the three case studies. The rcNet algorithms are available as a webtool for disease and gene set association analysis at http://compbio.cs.umn.edu/dgsa_rcNet.Item MCTA: Target Tracking Algorithm based on Minimal Contour in Wireless SensorNetworks(2007-01-26) Jeong, Jaehoon; Hwang, TaeHyun; He, Tian; DuHung-Chang, DavidThis paper proposes a minimal contour tracking algorithm (MCTA) that reduces energy consumption for tracking mobile targets in wireless sensor networks in terms of sensing and communication energy consumption. MCTA conserves energy by letting only a minimum number of sensor nodes participate in communication and perform sensing for target tracking. MCTA uses the minimal tracking area based on the vehicular kinematics. The modeling of target's kinematics allows for pruning out part of the tracking area that cannot be mechanically visited by the mobile target within scheduled time. So, MCTA sends the tracking area information to only the sensor nodes within minimal tracking area and wakes them up. Compared to the legacy scheme which uses circle-based tracking area, our proposed scheme uses less number of sensors for tracking in both communication and sensing without target missing. Through simulation, we show that MCTA outperforms the circle-based scheme with about 60% energy saving under certain ideal situations.Item Prioritizing Disease Genes with Label Propagation on a Heterogeneous Network(2009-09-08) Hwang, TaeHyun; Kuang, RuiEvidences from recent studies suggest that disease-causative genes can be identified more accurately from the modular structures in a heterogeneous network that integrates a disease phenotype similarity subnetwork, a gene-gene interaction subnetwork and a phenotype-gene association subnetwork. However, it is a challenging machine learning problem to explore a heterogeneous network comprising several subnetworks since each subnetwork contains its own cluster structures that need to be explored independently. We introduce a general regularization framework and an intuitive and efficient algorithm called MINProp for propagating information between an arbitrary number of subnetworks in a heterogeneous network. Our algorithm performs label propagation on each individual subnetwork with the current label information derived from all the subnetworks, and repeats this step until convergence to the global optimal solution of the convex objective function in the regularization framework. In simulations, we show that MINProp can significantly improve the ranking task by removing the biases introduced by the discrepancy among the subnetworks. We then tested MINProp for disease gene prioritization on a large-scale heterogeneous network containing 8919 genes and 5080 OMIM phenotypes. MINProp achieved competitive or better overall gene ranking performance than CIPHER and random walk with restart, two best-performing methods for disease gene prioritization, in both leave-one-out cross-validation and the case study of discovering new disease phenotype-gene associations added to OMIM after May 2007. We also validated that MINProp can specifically improve the ranking of those disease genes in the dense modules of the gene-gene interaction network. Furthermore, MINProp revealed interesting global modular structure of human disease phenotype-gene associations and new associations that are only reported recently.Item Reconstructing Disease Phenome-genome Association by Bi-Random Walk(2013-08-16) Xie, MaoQiang; Hwang, TaeHyun; Kuang, RuiPromising results were recently reported in utilizing network information in phenotype-similarity network and gene-interaction network with graph-based learning to derive new disease phenotype-gene associations. However, a more fundamental understanding of how the network information is relevant to disease phenotype-gene associations is lacking. In this paper, we analyze the circular bigraphs (CBGs) in OMIM phenotype-gene association networks, and introduce a bi-random walk (BiRW) algorithm to capture the CBG patterns in the networks for unveiling the associations between the complete collection of disease phenotypes (phenome) and genes. BiRW performs separate random walk simultaneously on gene interaction network and phenotype similarity network to explore gene paths and phenotype paths in CBGs of different sizes. In the analysis of OMIM associations, we discovered that 81% of the associations are covered by CBG patterns of path-length up to 3 with variability by 21 disease classes, and there is a clear correlation between the CBG coverage and the predictability of the phenotype-gene associations. Some prominent examples are cancers, nutritional diseases, dermatological diseases, bone diseases, cardiovascular diseases and respiratory diseases. Experiments on recovering known associations in cross-validation and predicting new associations in a test set validated that BiRW effectively improved prediction performance over existing methods by ranking more known associations in the top 100 out of more than 12,000 candidate genes. The investigation of the global disease phenome-genome association map also revealed interesting new predictions and phenotype-gene modules by disease classes. Availability: http://compbio.cs.umn.edu/BiRW