Leveraging Summary Statistics and Integrative Analysis for Prediction and Inference in Genome-Wide Association Studies
2020-07
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Leveraging Summary Statistics and Integrative Analysis for Prediction and Inference in Genome-Wide Association Studies
Authors
Published Date
2020-07
Publisher
Type
Thesis or Dissertation
Abstract
Genome-wide association studies (GWASs) have attained substantial success in parsing the genetic etiology of complex traits. GWAS analyses have identified many genetic variants associated with various traits, and polygenic risk scores estimated from GWASs have been used to effectively predict certain clinical phenotypes. Despite these accomplishments, GWASs suffer from some pervasive issues with power and interpretability. To address these issues, we develop powerful and novel approaches for prediction and inference on genetic and genomic data. Our approaches focus on two key elements. First is the incorporation of additional sources of genetic and genomic data. A typical GWAS characterizes the genetic basis of a trait in terms of associations between the trait and a set of single nucleotide polymorphisms (SNPs). This approach can often be underpowered and difficult to understand biologically. We can often increase power and interpretability by effectively incorporating other sources of genetic and genomic data into the single SNP analysis structure. Second is the development of methods that are widely applicable in the context of summary statistics. Many published GWAS analyses do not provide so-called individual level genetic and genomic data, and instead provide only summary statistic information. Given this, we want our methods to be able to be flexible in the context of summary statistics without the need for individual level information. We first develop a novel approach to integrating somatic and germline information from tumors to identify genes associated with lung cancer risk. We leverage this approach to discover potentially novel genes associated with lung cancer. We then investigate the problem of estimating powerful and parsimonious models for polygenic risk scores in the context of summary statistics. We develop a set of novel methods for model estimation, model selection, and the assessment of model performance, and demonstrate their beneficial properties in extensive simulation and in application to GWASs of lung cancer, blood lipid levels, and height. Lastly, we integrate our methods for polygenic risk score estimation into a two sample two-stage least squares analysis framework to identify potentially novel endophenotypes associated with increased risk of Alzheimer's disease. We demonstrate via simulation and real data application that our approach is powerful and effective.
Keywords
Description
University of Minnesota Ph.D. dissertation. July 2020. Major: Biostatistics. Advisor: Wei Pan. 1 computer file (PDF); xi, 145 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Pattee, Jack. (2020). Leveraging Summary Statistics and Integrative Analysis for Prediction and Inference in Genome-Wide Association Studies. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216351.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.