Statistical methods for genetic and epigenetic studies

BAI, YUN2016-12-192016-12-192016-09https://hdl.handle.net/11299/183400University of Minnesota Ph.D. dissertation. September 2016. Major: Biostatistics. Advisors: Weihua Guan, Haitao Chu. 1 computer file (PDF); ix, 89 pages.A common theme to many current large-scale genetic and epigenetic studies is their high-throughput nature of interrogating hundreds of thousands of genetic markers simultaneously. Inherent to these large-scale measurements are the inevitable technical variations of no biological interest. Typically pre-processing methods are applied to remove these technical variations and various other unwanted variations (e.g., batch effects) so that we can obtain unbiased estimates. Most statistical methods typically treat these processed measures as gold standard without any errors in the downstream analysis. In this thesis, we aim to develop unified modeling approaches to accommodating these technical variations into downstream statistical analysis. Motivated by the Atherosclerosis Risk In Communities (ARIC) Study, we develop alternative statistical methods to incorporate these technical variations to analyze the epigenome-wide methylation data. Specifically we will study the reproducibility of the methylation measures (Chapter 3) and the epigenome-wide association studies (Chapter 4) incorporating these technical variations. Similar to the epigenome-wide methylation data, the single nucleotide polymorphism (SNP) data provides another genome-wide measures of genetic markers. In the past decade, the genome-wide association studies (GWAS) have found thousands of SNPs associated with various diseases. Most large-scale GWAS have taken a marginal association test approach: testing the association of each trait and marker individually. The GWAS summary statistics (e.g., association test statistics) are generally publicly posted. However the raw genotype and phenotype data are more difficult to share publicly due to privacy and various logistic reasons. Therefore it is desirable to develop statistical methods that can take and mine these publicly available summary data to gain additional insights. In thesis, we develop a statistical method that just needs the summary data from multiple GWAS conducted on the same cohort (i.e., the same genotype data with multiple traits) to identify additional genetic variants that are associated with the outcomes (Chapter 2).enStatistical methods for genetic and epigenetic studiesThesis or Dissertation