Incorporating biological knowledge of genes into microarry data analysis.

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Incorporating biological knowledge of genes into microarry data analysis.

Published Date




Thesis or Dissertation


Microarray data analysis has become one of the most active research areas in bioinformatics in the past twenty years. An important application of microarray technology is to reveal relationships between gene expression profiles and various clinical phenotypes. A major characteristic in microarray data analysis is the so called "large p, small n" problem, which makes it difficult for parameter estimation. Most of the traditional statistical methods developed in this area target to overcome this difficulty. The most popular technique is to utilize an L1 norm penalty to introduce sparsity into the model. However, most of those traditional statistical methods for microarray data analysis treat all genes equally, as for usual covariates. Recent development in gene functional studies have revealed complicated relationships among genes from biological perspectives. Genes can be categorized into biological functional groups or pathways. Such biological knowledge of genes along with microarray gene expression profiles provides us the information of relationships not only between gene and clinical outcomes but also among the genes. Utilizing such information could potentially improve the predictive power and gene selection. The importance of incorporating biological knowledge into analysis has been increasingly recognized in recent years and several new methods have been developed. In our study, we focus on incorporating biological information, such as the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, into microarray data analysis for the purpose of prediction. Our first method aims implement this idea by specifying different L1 penalty terms for different gene functional groups. Our second method models a covariance matrix for the genes by assuming stronger within-group correlations and weaker between-group correlations. The third method models spatial correlations among the genes over a gene network in a Bayesian framework.


University of Minnesota Ph.D. dissertation. April 2009. Major:Biostatistics. Advisor: Wei Pan. 1 computer file (PDF); v, 91 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Tai, Feng. (2009). Incorporating biological knowledge of genes into microarry data analysis.. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.