Browsing by Subject "Association testing"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Powerful Association Testing with Application to Neuroimaging Genetics(2017-05) Xu, ZhiyuanIn spite of the huge success of the standard single-nucleotide polymorphism (SNP) based analysis in genome-wide association studies (GWASs), it has some limitations. First, it suffers power loss from a stringent significance level due to multiplicity adjust- ment for up to millions of tests. In addition, it has low power since the effect sizes of SNPs are usually small. Instead, gene-based testing might improve statistical power by aggregating moderate to weakly associated SNPs within each gene while greatly re- ducing the burden of multiple testing adjustment from millions to thousands. Second, almost all existing analyses do not explicitly account for (unknown) genetic hetero- geneity, leading to possible loss of power as convincingly shown in simulation studies (Londono et al., 2012; Qian and Shao, 2013; Zhou and Pan, 2009). Moreover, as there are many other data resources available (e.g. neuroimaging phenotypes, molecular phenotypes like gene expression) besides GWAS/DNA sequencing data, integrating them into GWAS is expected to boost statistical power. We first introduce a flexible framework to extend score-based testing in generalized linear models to more complex models, for example, mixed effect models. Second, we show that by accounting for genetic heterogeneity, more associated SNPs can be detected than the standard one-degree-of-freedom trend test in single SNP-based testing. Third, we propose a new adaptive aSPC test to detect associations between two random vectors in moderate to high dimensions; we also point out its connections to some existing association testing for multiple SNPs and multiple traits. Finally, we propose a novel gene-based association testing approach by incorporating weights derived from other data resources (e.g. from another eQTL dataset). We show the power gain of the new approach over two existing methods PrediXcan and TWAS, pointing out that both PrediXcan and TWAS are special cases of our new test.Item Two topics in association analysis of DNA sequencing data: population structure and multivariate traits(2013-08) Zhang, YiweiAs the next-generation sequencing technologies become mature and affordable, we now have access to massive data of single nucleotides variants (SNVs) with varying minor allele frequencies (MAFs). This poses new opportunities, as more information from the human genome is available. However, new challenges also show up, such as how to utilize those SNVs with low MAFs. With current intensive efforts in association testing to detect genetic loci associated with common diseases and complex traits, two issues are of primary interest: reducing spurious findings and increasing power for true discoveries. In association testing, a major cause to the elevated level of false positives is the confounding effect of population structure -- the so-called population stratification. As a remedy, one popular method is to add principal components (PCs) in a regression model, named principal component regression (PCR). Yet, it is not clear how PCR will work in testing rare variants (RVs, with MAF$<0.01$), or with population stratification in a fine scale. More questions arise, like what types and what sets of SNVs should be used to construct PCs, and whether there are other better methods than principal component analysis (PCA) for constructing PCs. Utilizing the DNA sequencing data from the 1000 Genomes project, we first investigate whether PCR is adequate in adjusting for population stratification while maintaining high power when testing low frequency variants (LFVs with 0.01&lq MAF<0.05) and RVs. Furthermore, we compare the performance of two dimension reduction methods, PCA and spectral dimension reduction (SDR), as well as twelve different types and sets of variants for constructing PCs. The comparison is conducted with respect to controlling population stratification in a fine scale. On the other hand, linear mixed models (LMM) have emerged with its superior performance in handling complex population structures. Herein, we examine the connection and difference between PCR and LMM based on the formulation of probabilistic PCA, and propose a hybrid method combining the two. Its outstanding performance in addressing both population structure and environmental confounders is established by simulations using the the Genetic Analysis Workshop (GAW) 18 data and the 1000 Genomes project data. Lastly, we consider boosting power for association analysis of multivariate traits. A new class of tests, the sum of powered score tests (SPU), and an adaptive SPU (aSPU) test are extended to the generalized estimation equations (GEE) framework. We apply the new and some existing methods to association testing on both CVs and RVs with an HIV/AIDS dataset and the GAW 18 data.