Browsing by Subject "Causal inference"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Bayesian Causal Inferencee In Meta-Analysis(2019-05) Zhou, JinchengWhile the randomized clinical trial (RCT) is the gold standard for investigating the effect of a medical intervention, noncompliance to assigned treatments can threaten a trial's validity. Noncompliance, if not appropriately controlled, can introduce substantial bias into the estimate of treatment effect. The complier average causal effect (CACE) approach provides a useful tool for addressing noncompliance, where CACE measures the effect of an intervention in the latent subgroup of the study population that complies with its assigned treatment (the compliers). Meta-analysis of RCTs has become a widely-used statistical technique to combine and contrast results from multiple independent studies. However, no existing methods can effectively deal with heterogeneous noncompliance in meta-analysis of RCTs. For example, the commonly used meta-analysis regression methods investigate the impact of study-level variables (e.g., mean age of the study population) on the study-specific treatment effect size by assuming the study-level covariates to be fixed. However, noncompliance rates generally differ between treatment groups within a study and are commonly considered as random rather than fixed post-randomization variables. In addition, noncompliance may dynamically interact with the primary outcome and thus affect the response to treatment. Thus, meta-regression methods are not suitable to controlling for noncompliance. This thesis focuses on developing Bayesian methods to estimate CACE in meta-analysis of RCTs with binary or ordinal outcomes. Bayesian hierarchical random effects models are developed to appropriately account for the inherent heterogeneity in treatment effect and noncompliance between studies and treatment groups. We first present a Bayesian hierarchical model to estimate the CACE where heterogeneous compliance rates are available for each study. Second, we extend our approach to deal with incomplete noncompliance when some RCTs do not report noncompliance data. The results are illustrated by a re-analysis of a meta-analysis comparing the effect of epidural analgesia in labor versus no or other analgesia in labor on the outcome cesarean section, where noncompliance varies substantially between studies. Simulations are performed to evaluate the performance of the proposed approach and to illustrate the importance of including appropriate random effects by showing the impact of over- and under-fitting. Furthermore, we develop an R package, BayesCACE, to provide user-friendly functions to implement CACE analysis for binary outcomes based on the proposed Bayesian hierarchical models. This package includes flexible functions for analyzing data from a single RCT and from a meta-analysis of multiple RCTs with either complete or incomplete noncompliance data. The package also provides various functions for generating forest, trace, posterior density, and auto-correlation plots, and to review noncompliance rates, visually assess the model, and obtain study-specific and overall CACEs.Item Estimation of conditional average treatment effects(2014-07) Rolling, Craig AnthonyResearchers often believe that a treatment's effect on a response may be heterogeneous with respect to certain baseline covariates. This is an important premise of personalized medicine and direct marketing. Within a given set of regression models or machine learning algorithms, those that best estimate the regression function may not be best for estimating the effect of a treatment; therefore, there is a need for methods of model selection targeted to treatment effect estimation. In this thesis, we demonstrate an application of the focused information criterion (FIC) for model selection in this setting and develop a treatment effect cross-validation (TECV) aimed at minimizing treatment effect estimation errors. Theoretically, TECV possesses a model selection consistency property when the data splitting ratio is properly chosen. Practically, TECV has the flexibility to compare different types of models and estimation procedures.In the usual regression settings, it is well established that model averaging (or more generally, model combining) frequently produces substantial performance gains over selecting a single model, and the same is true for the goal of treatment effect estimation. We develop a model combination method (TEEM) that properly weights each model based on its (estimated) accuracy for estimating treatment effects. When the baseline covariate is one-dimensional, the TEEM algorithm automatically produces a treatment effect estimate that converges at almost the same rate as the best model in a candidate set.We illustrate the methods of FIC, TECV, and TEEM with simulation studies, data from a clinical trial comparing treatments of patients with HIV, and a benchmark public policy dataset from a work skills training program. The examples show that the methods developed in this thesis often exhibit good performance for the important goal of estimating treatment effects conditional on covariates.Item Integrating summarized imaging and genomic data with GWAS for powerful endophenotype association testing in Alzheimer’s Disease(2021-06) Knutson, KatherineGenome-wide association studies (GWAS) have identified thousands of genetic variants associated with complex traits. However, for most diseases, individual risk variants have small effects which impact disease indirectly through upstream endophenotypes. To improve on the power and interpretability of GWAS, a number of approaches have been developed which aggregate contributions from one or multiple genetic variants to investigate the role of genetically regulated endophenotypes in complex traits. These methods include Mendelian Randomization (MR) and the Transcriptome/Imaging Wide Association Study (TWAS/IWAS, which test for associated gene expression and imaging phenotypes, respectively). In this dissertation, I will compare the performance of these approaches for detecting brain imaging derived phenotypes (IDPs) associated with Alzheimer’s Disease. I will present novel extensions to the TWAS/IWAS framework to account for key biological factors which may impact their performance in practice, namely 1) genetic pleiotropy and 2) population substructure. The first of these factors, genetic pleiotropy, describes the phenomenon in which genetic loci affect multiple intermediate risk phenotypes. The presence of pervasive pleiotropy can result in inconsistent IWAS estimates. I will present a novel extension to the IWAS model (namely, MV-IWAS) which provides consistent causal estimates of endophenotype-trait associations by directly and indirectly accounting for pleiotropic pathways. The second of these factors, population substructure, describes ancestral variation in the underlying genetic architecture of endophenotypes. This variation can lead to ancestry-specific effects of gene expression in TWAS, which go undetected in the standard TWAS framework. Here, I will present a score test to detect heterogeneity in the effects of genetically-regulated gene expression which are correlated with ancestry. By jointly analyzing samples from multiple populations, our multi-ancestry TWAS framework can improve power for detecting genes with shared expression-trait associations across populations through increased sample sizes, as compared to existing stratified TWAS approaches.Item Network Structure Identification using Corrupt Data-Streams(2021-08) Subramanian, Venkat RamMany complex systems lend themselves to effective modeling described by a network of dynamically interacting agents. Such modeling is prevalent in many application domains that include climate science, neuroscience, internet-of-things, power grids, and econometrics. The evolution of these systems is governed by the interdependencies and interactions between the agents that can contain feedback loops. Identification of the presence or absence of influence pathways among the agents is of primary importance that enables subsequent analytics in networked systems such as identifying central agents and clusters, devising control strategies in distributed systems, and resource allocation. In most application domains, the nature of the relationships and interdependencies cannot be easily modeled using first principles. Furthermore, in many such systems, it is not possible to deliberately affect the system, and thus passive or noninvasive methods are required. The existing methods of network identification do not account for the common ways through which data gets corrupted. In real-world systems, sensor readings can be inaccurate, clocks can get out of sync, and messages can get lost in transmission over a wireless network. The focus of this research is to incorporate realistic modeling assumptions on data streams and characterize the effects of data corruption on network identification using passive means. We show that identifying the structure of networked systems using corrupt measurements results in the inference of spurious links. The effects of data corruption on network reconstruction are characterized with provable guarantees on the quality of construction with respect to the generative models considered. A wide range of generative models that underlie the data streams are considered that include static interactions (Markov random fields), linear time-invariant dynamical systems, and nonlinear dynamical models. We examine both causal and non-causal inference methods. In both cases, we provide an exact characterization of the location of spurious links. Our results show that the spurious links are localized to the neighborhood of the corrupted node. All our solution methodologies utilize only the time-series observations without any knowledge of the system parameters. Our precise characterization of the erroneous links is further exploited when the network has special structural properties. There are several physical systems, especially flow-driven systems like power grids, heat transfer networks, and fluid flow networks, where every dynamic coupling between the agents/nodes is bi-directional. In such systems, identifying unidirectional links in reconstruction lead to the conclusion that such links arise from data corruption. We utilize our precise characterization of spurious links to detect and localize all corrupt nodes in the network. It is imperative that learning the exact network representation of such systems without spurious links is needed for performing accurate state estimation, control, and optimization. To this, we developed methods to remove all spurious links and identify the exact structure of bi-directed networks despite of data corruption.Item Policy-relevant causal effect estimators for heterogeneous treatments and exposures(2021-12) Lyden, GraceMost statistical methods for causal inference are designed to handle contrasts between well-defined treatment groups, e.g., vaccine versus placebo. In real-world applications, however, these contrasts might fail to answer relevant questions for patients and policy makers. This dissertation introduces new policy-relevant causal estimators that target the effects of heterogeneous treatments and exposures in observational data. Chapter 2 is motivated by correlated chemical mixtures. Historically, environmental health researchers have estimated separate effects of each chemical in a family. More recently, federal agencies have called for estimation of overall mixture effects, which acknowledge the potential real-world burden of simultaneous exposure. A secondary goal of mixtures research is to identify the most harmful components for regulation. Weighted Quantile Sum (WQS) regression has emerged to answer this call. WQS assigns a regularized weight to each chemical in a mixture through a form of bootstrap aggregation, then tests the effect of the weighted sum in a held-out dataset. Although popular, WQS is limited by its dependence on data splitting, which is needed to preserve Type I error. In Chapter 2, we propose the first modification of WQS that does not rely on data splitting and replaces the second step of WQS with a permutation test that yields correct p-values. To offset the added computational burden of a permutation test, we suggest alternatives to the bootstrap for regularization of the weights, namely L1 and L2 penalization, and discuss how to choose the appropriate penalty given expert knowledge about the mixture of interest. Chapters 3 and 4 are motivated by the difficult decisions faced by candidates for organ transplant. Due to organ scarcity, these patients typically have to wait to receive an offer of a suitable deceased-donor organ or possibly pursue living-donor transplant, depending on the organ needed. For patients who have that choice, the difference in post-transplant survival between living- and deceased-donor transplant is a straightforward quantity to estimate, but might not be particularly helpful to patients who experience real-world variability in wait time and offered-organ quality. A more useful contrast, therefore, is the survival difference between treatment strategies that account for this uncertainty, such as "wait for deceased-donor transplant," which could encompass many possible wait times and donor organ qualities. Decisions for patients today are further complicated if versions of treatment have changed over time, for example if the rate of transplant has changed due to evolutions in allocation policy. We, therefore, introduce the concept of a generalized representative intervention (GRI): a random dynamic treatment regime that assigns version of treatment with the distribution of similar patients in a target population under a loosely defined strategy of interest. Chapter 3 proposes a class of weighted product-limit estimators of marginal survival under a GRI, which are consistent, asymptotically normal, and have excellent finite-sample performance in a realistic simulation study. Chapter 4 extends this work to determine the optimal strategy for an individual based on their expected rate of treatment in the target population. Specifically, we propose a marginal structural modeling approach that allows a patient-specific relative rate of treatment to modify the effects of the GRIs under consideration. We apply our methods to data from the Scientific Registry of Transplant Recipients to determine the optimal strategy for kidney-pancreas transplant candidates under the current organ allocation system.Item Statistical Methods for Variable Selection in Causal Inference(2018-07) Koch, Brandon LeeEstimating the causal effect of a binary intervention or action (referred to as a "treatment") on a continuous outcome is often an investigator's primary goal. Randomized trials are ideal for estimating causal effects because randomization eliminates selection bias in treatment assignment. However, randomized trials are not always ethically or practically possible, and observational data must be used to estimate the causal effect of treatment. Unbiased estimation of causal effects with observational data requires adjustment for confounding variables that are related to both the outcome and treatment assignment. Adjusting for all measured covariates in a study protects against bias, but including covariates unrelated to outcome may increase the variability of the estimated causal effect. Standard variable selection techniques aim to maximize predictive ability of a model for the outcome and are used to decrease variability of the estimated causal effect, but they ignore covariate associations with treatment and may not adjust for important confounders weakly associated to outcome. We propose two approaches for estimating causal effects that simultaneously consider models for both outcome and treatment assignment. The first approach is a variable selection technique for identifying confounders and predictors of outcome using an adaptive group lasso approach that simultaneously performs coefficient selection, regularization, and estimation across the treatment and outcome models. In the second approach, two methods are proposed that simultaneously model outcome and treatment assignment using a Bayesian formulation with spike and slab priors on each covariate coefficient; the Spike and Slab Causal Estimator (SSCE) aims to achieve minimum bias of the causal effect estimator while Bilevel SSCE (BSSCE) aims to minimize its mean squared error. We also propose TEHTrees, a new method that combines matching and conditional inference trees to characterize treatment effect heterogeneity. One of its main virtues is that, by employing formal hypothesis testing procedures in constructing the tree, TEHTrees preserves the Type I error rate.