Repository logo
Log In

University Digital Conservancy

University Digital Conservancy

Communities & Collections
Browse
About
AboutHow to depositPolicies
Contact

Browse by Subject

  1. Home
  2. Browse by Subject

Browsing by Subject "Multivariate traits"

Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Loading...
    Thumbnail Image
    Item
    Two topics in association analysis of DNA sequencing data: population structure and multivariate traits
    (2013-08) Zhang, Yiwei
    As the next-generation sequencing technologies become mature and affordable, we now have access to massive data of single nucleotides variants (SNVs) with varying minor allele frequencies (MAFs). This poses new opportunities, as more information from the human genome is available. However, new challenges also show up, such as how to utilize those SNVs with low MAFs. With current intensive efforts in association testing to detect genetic loci associated with common diseases and complex traits, two issues are of primary interest: reducing spurious findings and increasing power for true discoveries. In association testing, a major cause to the elevated level of false positives is the confounding effect of population structure -- the so-called population stratification. As a remedy, one popular method is to add principal components (PCs) in a regression model, named principal component regression (PCR). Yet, it is not clear how PCR will work in testing rare variants (RVs, with MAF$<0.01$), or with population stratification in a fine scale. More questions arise, like what types and what sets of SNVs should be used to construct PCs, and whether there are other better methods than principal component analysis (PCA) for constructing PCs. Utilizing the DNA sequencing data from the 1000 Genomes project, we first investigate whether PCR is adequate in adjusting for population stratification while maintaining high power when testing low frequency variants (LFVs with 0.01&lq MAF<0.05) and RVs. Furthermore, we compare the performance of two dimension reduction methods, PCA and spectral dimension reduction (SDR), as well as twelve different types and sets of variants for constructing PCs. The comparison is conducted with respect to controlling population stratification in a fine scale. On the other hand, linear mixed models (LMM) have emerged with its superior performance in handling complex population structures. Herein, we examine the connection and difference between PCR and LMM based on the formulation of probabilistic PCA, and propose a hybrid method combining the two. Its outstanding performance in addressing both population structure and environmental confounders is established by simulations using the the Genetic Analysis Workshop (GAW) 18 data and the 1000 Genomes project data. Lastly, we consider boosting power for association analysis of multivariate traits. A new class of tests, the sum of powered score tests (SPU), and an adaptive SPU (aSPU) test are extended to the generalized estimation equations (GEE) framework. We apply the new and some existing methods to association testing on both CVs and RVs with an HIV/AIDS dataset and the GAW 18 data.

UDC Services

  • About
  • How to Deposit
  • Policies
  • Contact

Related Services

  • University Archives
  • U of M Web Archive
  • UMedia Archive
  • Copyright Services
  • Digital Library Services

Libraries

  • Hours
  • News & Events
  • Staff Directory
  • Subject Librarians
  • Vision, Mission, & Goals
University Libraries

© 2025 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer.
Policy statement | Acceptable Use of IT Resources | Report web accessibility issues