Browsing by Author "Dey, Sanjoy"

Now showing 1 - 3 of 3

A pattern mining based integrative framework for biomarker discovery
(2012-02-10) Dey, Sanjoy; Atluri, Gowtham; Steinbach, Michael; MacDonald, Angus; Lim, Kelvin; Kumar, Vipin
Recent advancement in high throughput data collection technologies has resulted in the availability of diverse biomedical datasets that capture complementary information pertaining to the biological processes in an organism. Biomarkers that are discovered by integrating these datasets obtained from a case-control studies have the potential to elucidate the biological mechanisms behind complex human diseases. In this paper we define an interaction-type integrative biomarker as one whose features together can explain the disease, but not individually. In this paper, we propose a pattern mining based integrative framework (PAMIN) to discover an interaction-type integrative biomarkers from diverse case control datasets. PAMIN first finds patterns form individual datasets to capture the available information separately and then combines these patterns to find integrated patterns (IPs) consisting of variables from multiple datasets. We further use several interestingness measures to characterize the IPs into specific categories. Using synthetic data we compare the IPs found using our approach with those of CCA and discriminative-CCA (dCCA). Our results indicate that PAMIN can discover interaction type patterns that competing approaches like CCA and discriminative-CCA cannot find. Using real datasets we also show that PAMIN discovers a large number of statistically significant IPs than the competing approaches.
Finding Integrative Biomarkers from Biomedical Datasets: An application to Clinical and Genomic Data
(2015-08) Dey, Sanjoy
Human diseases, such as cancer, diabetes and schizophrenia, are inherently complex and governed by the interplay of various underlying factors ranging from genetic and genomic influences to environmental effects. Recent advancements in high throughput data collection technologies in bioinformatics have resulted in a dramatic increase in diverse data sets that can provide information about such factors related to diseases. These types of data include DNA microarrays providing cellular information, Single Nucleotide Polymorphisms (SNPs) providing genetic information, metabolomics data in terms of proteins and other metabolites, structural and functional brain data from magnetic resonance imaging (MRI), and electronic health records (EHRs) containing copious information about histo-pathological factors, demographic, and environmental effects. Despite their richness, each of these datasets only provides information about a part of the complex biological mechanism behind human diseases. Thus, effective integration of the partial information of any of these genomic and clinical data can help reveal disease complexities in greater detail by generating new data-driven hypotheses beyond the traditional hypotheses about biomarkers. In particular, integrative biomarkers, i.e., patterns of features that are predictive of disease and that go beyond the simple biomarkers derived from a single dataset, can lead to a customized and more effective approach to improving healthcare. This thesis focuses on addressing the key issues related to integrative biomarkers by developing new data mining approaches. One very important issue of biomarker discovery is that the models have to easily interpretable, i.e., integrative models have to be not only predictive of the disease, but also interpretable enough so that domain experts can infer useful knowledge from the obtained patterns. In one such effort to make models interpretable, domain information about disease relationships was used as prior knowledge during model development. In addition, a novel metric called I-score was proposed using medical literature to quantify the interpretability of the obtained patterns. Another key issue of integrative biomarker discovery is that there may be many potential relationships present among diverse datasets. For example, a very important types of relationship in biomarker discovery is interaction, which are those biomarkers spanning multiple datasets, whose combined features are more indicative of disease than the individual constituent factors. In particular, the individual effects of each type of factor on disease predisposition can be small and thus, remain undetected by most disease association techniques performed on individual datasets. Different types of relationships are explored and an association analysis based framework is proposed to discover them. The proposed framework is especially effective for discovering higher-order relationships, which cannot be found by the existing prominent integrative approaches for the biomarker discovery. When applied on real datasets collected from three different types of data from schizophrenic and normal subjects, this approach yielded significant integrated biomarkers which are biologically relevant. Disease heterogeneity creates further issues for integrative biomarker discovery, biomarkers obtained from clinicogenomic studies may not be applicable to all patients in the same degree, i.e., a disease consist of multiple subtypes, each occurring in different subpopulations. Some potential reasons responsible for disease heterogeneity are different pathways playing different roles in the same disease and confounding factors such as age, ethnicity and race, or genetic predisposition, which can be available in rich EHR data. Most biomarker discovery techniques use full space model development techniques, i.e., they assess the performance of biomarkers on all patients without finding the distinct subpopulations. In this thesis, more customized models were built depending on patient\'s characteristics to handle disease heterogeneity. In summary, several data mining techniques developed in this thesis advance the state-of-the art in integration of diverse biomedical datasets. Moreover, their applications on large-scale EHR yield significant discoveries, which can ultimately lead to generating new data-driven hypotheses for inferring meaningful information about complex disease mechanism.
Integration of Clinical and Genomic data: a Methodological Survey
(2013-02-20) Dey, Sanjoy; Gupta, Rohit; Steinbach, Michael; Kumar, Vipin
Human diseases are inherently complex and governed by the complicated interplay of several underlying factors. Clinical research focuses on behavioral, demographic and pathology information, whereas molecular genomics focuses on finding underlying genetic and genomic factors in genomic data collected on mRNA expression, proteomics, biological networks, and other microbiological features. However, each of these clinical and genomic datasets contains information only about one particular aspect of a complex disease, rather than covering all of the several complicated underlying risk factors. This has led to a new area of research that integrates both clinical and genomic data and aims to extract more information about diseases by considering not only all the various factors, but also the interactions among those factors, which cannot be captured by clinical and genomic studies that are performed independently of each other. Although initial efforts have already been made to develop such integrative modeling of the clinical and genomic data to shed light on the biological mechanism of the diseases, the research field is still in a rudimentary stage. In this review article, we survey the general issues, challenges and current work of clinicogenomic studies. We also summarize the current state of the field and discuss some possibilities for future work.

University Digital Conservancy

Browse by Author

Browsing by Author "Dey, Sanjoy"