Browsing by Subject "Mixture model"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Network-based mixture models for genomic data.(2009-06) Wei, PengA common task in genomic studies is to identify genes satisfying certain conditions, such as differentially expressed genes between normal and tumor tissues or regulatory target genes of a transcription factor (TF). Standard approaches treat all the genes identically and independently a priori and ignore the fact that genes work coordinately in biological processes as dictated by gene networks, leading to inefficient analysis and reduced power. We propose incorporating gene network information as prior biological knowledge into statistical modeling of genomic data to maximize the power for biological discoveries. We propose a spatially correlated mixture model based on the use of latent Gaussian Markov random fields (GMRF) to smooth gene specific prior probabilities in a mixture model over a network, assuming that neighboring genes in a network are functionally more similar to each other. In addition, we propose a Bayesian implementation of a discrete Markov random field (DMRF)-based mixture model for incorporating gene network information, and compare its performance with that based on Gaussian Markov random fields. We also extend the network-based mixture models to ones that are able to integrate multiple gene networks and diverse types of genomic data, such as protein- DNA binding, gene expression and DNA sequence data, to accurately identify regulatory target genes of a TF. Applications to high-throughput microarray data, along with simulations, demonstrate the utility of the new methods and the statistical efficiency gains over other methods.Item Statistical methods for gene set based significance analysis.(2011-07) Lee, Sang MeeGene set enrichment analysis (GSEA) is a method to identify groups of genes, which are statistically more differentially expressed than all other genes across different treatments within a microarray study. Most of the existing approaches have largely relied on nonparametric methods and require repeated computation of permutation and resampling data to assess the significance of a gene set. In this dissertation, we study parametric approaches for GSEA by formulating the enrichment analysis into a simple model comparison problem. The methods not only gain the flexibility in statistical modeling corresponding to biological problems but also achieve computational efficiency. First, we propose a likelihood based approach assuming a finite mixture model for a two-class comparison problem and the implementation of the analysis is achieved by a likelihood ratio based testing approach. In addition we extend the parametric methods to flexible two-component mixture models for one-sided enrichment analysis which aims to test for enrichment of up (or down) regulation only. Also, we develop chi-square mixture models which incorporate the idea of two-class comparison studies into multiple category microarray experiments. Applications to gene expression data, along with simulations, demonstrate the computational efficiency and the competitive performance of the proposed methods.Item Survey sampling and multiple stratifications(2013-09) Zimmerman, Patrick Lennon KendallIn survey sampling, stratied random sampling and post-stratification can increase the precision of estimation. In some cases, however, there may be multiple ways to stratify a population. We present a method, based on a non-informative Bayesian approach, that uses a finite mixture model to incorporate information from each stratification into estimation. This approach works well when the response variable is categorical or discrete,and for some non-response types of problems. We provide the theoretical basis for our method, present some simulation results, discuss various extensions, and define some software that implements the method.