Robust Variance Component Models and Powerful Variable Selection Methods for Addressing Missing Heritability

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Robust Variance Component Models and Powerful Variable Selection Methods for Addressing Missing Heritability

Published Date




Thesis or Dissertation


The development of a complex human disease is an intricate interplay of genetic and environmental factors. Broadly speaking, “heritability” is defined as the proportion of total trait variance due to genetic factors within a given population. Over the past 50 years, studies involving monozygotic and dizygotic twins have estimated the heritability of over 17,800 human traits [1]. Genetic association studies that measure thousands to millions of genetic “markers” have attempted to determine the exact markers that explain a given trait’s heritability. However, often the identified set of “statistically-significant” markers fails to explain more than 10% of the estimated heritability of a trait [2], which has been defined as the “missing heritability” problem [3][4]. “Missing heritability’ implies that many genetic markers that contribute to disease risk are still waiting to be discovered. Identification of the exact genetic markers associated with a disease is important for the development of pharmaceutical drugs that may target these markers (see [5] for recent examples). Additionally, “missing heritability” may imply that we are inaccurately estimating heritability in the first place [3, 4, 6], thus motivating the development of more robust models for estimating heritability. This dissertation focuses on two objectives that attempt to address the missing heritability problem: (1) develop a more robust framework for estimating heritability; and (2) develop powerful association tests in attempt to find more genetic markers associated with a given trait. Specifically: in Chapter 2, robust variance component models are developed for estimating heritability in twin studies using second-order generalized estimating equations (GEE2). We demonstrate that GEE2 can improve coverage rates of the true heritability parameter for non-normally distributed outcomes, and can easily incorporate both mean and variance-level covariate effects (e.g. let heritability vary by sex or age). In Chapter 3, penalized regression is used to jointly model all genetic markers. It is demonstrated that jointly modeling all markers can improve power to detect individual associated markers compared to conventional methods that model each marker “one-at-a-time.” Chapter 4 expands on this work by developing a more flexible nonparametric Bayesian variable selection model that can account for non-linear or non-additive effects, and can also test biologically meaningful groups of markers for an association with the outcome. We demonstrate how the nonparametric Bayesian method can detect markers with complex association structures that more conventional models might miss.


University of Minnesota Ph.D. dissertation August . 2018. Major: Biostatistics. Advisor: Saonli Basu. 1 computer file (PDF); x, 127 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Arbet, Jaron. (2018). Robust Variance Component Models and Powerful Variable Selection Methods for Addressing Missing Heritability. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.