Xue, Haoran2021-08-162021-08-162021-05https://hdl.handle.net/11299/223167University of Minnesota Ph.D. dissertation. May 2021. Major: Statistics. Advisors: Xiaotong Shen, Wei Pan. 1 computer file (PDF); viii, 102 pages.There has been increasing interest in instrumental variables regression for causal inference. In genetics, transcriptome-wide association studies (TWAS), also known as PrediXcan, have recently emerged as a widely applied tool to discover causal/target genes by integrating an outcome GWAS dataset with another gene expression/ transcriptome GWAS (called eQTL) dataset; they can not only boost statistical power but also offer biological insights by identifying (putative) causal genes for a GWAS trait, e.g. low-density lipoprotein cholesterol (LDL). Statistically TWAS apply (two-sample) two-stage least squares (2SLS) with multiple correlated SNPs as instrumental variables (IVs) to predict/impute gene expression, in contrast to typical (two-sample) Mendelian randomization (MR) approaches using independent SNPs as IVs, which are expected to be lower-powered. However, some of the SNPs used may not be valid IVs as a result of their (horizontal) pleiotropic/direct effects on the trait not mediated through the gene of interest, leading to false conclusions by TWAS (or MR). We propose a general inferential method for possibly high-dimensional data to account for confounding and invalid IVs while selecting valid IVs simultaneously via two-stage constrained maximum likelihood; we develop a theory for the likelihood method subject to a truncated L1-constraint approximating the L0-constraint for asymptotically valid and efficient statistical inference on causal effects. We demonstrate both theoretically and numerically the superior performance of the proposed method over the standard 2SLS/TWAS and other methods. We apply the methods to identify causal genes for LDL by integrating GWAS summary data with eQTL data.enCausal InferenceGenome-Wide Association StudiesMendelian RandomizationTruncated L1-constraintTWASConstrained Likelihood Inference in Instrumental Variable Regression with Invalid Instruments and Its Application to GWAS Summary DataThesis or Dissertation