One of the major goals of microarray data analysis is to identify differentially expressed genes. In cancer studies, RNA is extracted from the tissue samples of cancer patients (case class) and healthy people (control class) to obtain the gene expression data and genes that are dierentially expressed between case and control are identied to be candidate biomarkers which could undergo further studies. More often, we encounter situations where gene expression between more than two classes are being compared instead of the traditional case/control setup, e.g., multiple disease stages or dierent experimental conditions. In this dissertation, the problem of identifying dierentially expressed genes in a multi-class comparison setting will be addressed.
To identify the dierentially expressed genes, it is important to select a test statistic to rank the genes, and common approaches usually summarize each gene expression into a univariate test statistic and nd a critical value for the ranking statistics to claim which gene is dierentially expressed. In the dissertation, a univariate test statistic (the moderated F-statistics) is rst used as a summary statistic and its distribution is empirically estimated using maximum likelihood. After that, A multivariate test statistic is proposed as a summary statistic for each gene and both parametric and non-parametric empirical Bayes approaches are adopted to rank the genes. The performances of the proposed methods are illustrated by extensive simulation studies and application to public microarray datasets. The results show that the proposed methods have better detection power than the commonly used approaches when controlling false discovery rates at the same level.