Genome-wide association studies (GWASs) have attained substantial success in parsing the genetic etiology of complex traits. GWAS analyses have identified many genetic variants associated with various traits, and polygenic risk scores estimated from GWASs have been used to effectively predict certain clinical phenotypes. Despite these accomplishments, GWASs suffer from some pervasive issues with power and interpretability. To address these issues, we develop powerful and novel approaches for prediction and inference on genetic and genomic data. Our approaches focus on two key elements. First is the incorporation of additional sources of genetic and genomic data. A typical GWAS characterizes the genetic basis of a trait in terms of associations between the trait and a set of single nucleotide polymorphisms (SNPs). This approach can often be underpowered and difficult to understand biologically. We can often increase power and interpretability by effectively incorporating other sources of genetic and genomic data into the single SNP analysis structure. Second is the development of methods that are widely applicable in the context of summary statistics. Many published GWAS analyses do not provide so-called individual level genetic and genomic data, and instead provide only summary statistic information. Given this, we want our methods to be able to be flexible in the context of summary statistics without the need for individual level information. We first develop a novel approach to integrating somatic and germline information from tumors to identify genes associated with lung cancer risk. We leverage this approach to discover potentially novel genes associated with lung cancer. We then investigate the problem of estimating powerful and parsimonious models for polygenic risk scores in the context of summary statistics. We develop a set of novel methods for model estimation, model selection, and the assessment of model performance, and demonstrate their beneficial properties in extensive simulation and in application to GWASs of lung cancer, blood lipid levels, and height. Lastly, we integrate our methods for polygenic risk score estimation into a two sample two-stage least squares analysis framework to identify potentially novel endophenotypes associated with increased risk of Alzheimer's disease. We demonstrate via simulation and real data application that our approach is powerful and effective.
University of Minnesota Ph.D. dissertation. July 2020. Major: Biostatistics. Advisor: Wei Pan. 1 computer file (PDF); xi, 145 pages.
Leveraging Summary Statistics and Integrative Analysis for Prediction and Inference in Genome-Wide Association Studies.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.