Statistical methods for genetic and epigenetic studies

A common theme to many current large-scale genetic and epigenetic studies is their high-throughput nature of interrogating hundreds of thousands of genetic markers simultaneously. Inherent to these large-scale measurements are the inevitable technical variations of no biological interest. Typically pre-processing methods are applied to remove these technical variations and various other unwanted variations (e.g., batch effects) so that we can obtain unbiased estimates. Most statistical methods typically treat these processed measures as gold standard without any errors in the downstream analysis. In this thesis, we aim to develop unified modeling approaches to accommodating these technical variations into downstream statistical analysis. Motivated by the Atherosclerosis Risk In Communities (ARIC) Study, we develop alternative statistical methods to incorporate these technical variations to analyze the epigenome-wide methylation data. Specifically we will study the reproducibility of the methylation measures (Chapter 3) and the epigenome-wide association studies (Chapter 4) incorporating these technical variations. Similar to the epigenome-wide methylation data, the single nucleotide polymorphism (SNP) data provides another genome-wide measures of genetic markers. In the past decade, the genome-wide association studies (GWAS) have found thousands of SNPs associated with various diseases. Most large-scale GWAS have taken a marginal association test approach: testing the association of each trait and marker individually. The GWAS summary statistics (e.g., association test statistics) are generally publicly posted. However the raw genotype and phenotype data are more difficult to share publicly due to privacy and various logistic reasons. Therefore it is desirable to develop statistical methods that can take and mine these publicly available summary data to gain additional insights. In thesis, we develop a statistical method that just needs the summary data from multiple GWAS conducted on the same cohort (i.e., the same genotype data with multiple traits) to identify additional genetic variants that are associated with the outcomes (Chapter 2).

Description

University of Minnesota Ph.D. dissertation. September 2016. Major: Biostatistics. Advisors: Weihua Guan, Haitao Chu. 1 computer file (PDF); ix, 89 pages.

Collections

Dissertations

Suggested citation

BAI, YUN. (2016). Statistical methods for genetic and epigenetic studies. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/183400.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Statistical methods for genetic and epigenetic studies

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Statistical methods for genetic and epigenetic studies

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation