Browsing by Subject "Missing data"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Mediation analysis in longitudinal studies in the presence of measurement error and missing data(2018-05) Ssenkusu, John MbaziiraMediation analysis hypothesizes that an exposure causes a mediator and in turn the mediator causes the outcome, so mediation is inherently longitudinal. Unfortunately, potential mediators may be measured with error and regression estimators obtained by ignoring measurement error can be severely biased. This can induce bias in the estimation of causal direct and indirect effects. In Chapter 2, using regression calibration, we show how to adjust for measurement error in longitudinal studies with repeated measurements of the mediator, and evaluate the effect of ignoring measurement error on direct and indirect effects. Rather than assuming normality for the random effects in the linear mixed effects calibration model, we correct for measurement error in the mediator allowing flexibility in the distribution of subject-specific random effects. On the other hand, longitudinal studies face challenges of missing data resulting from loss to follow-up, death, or withdrawal. In mediation analysis, multiple imputation has been shown to perform well for data missing completely at random (MCAR) and missing at random (MAR) in cross-sectional studies, but it is unclear how it performs in longitudinal studies under misspecification of the imputation model, specifically, where the misspecification ignores clustering by subject. In Chapter 3, we examine the impact of ignoring clustering on mediated effect estimates under MCAR and MAR mechanisms with varying degrees of missingness. In Chapter 4, using data from a randomized controlled trial, we examine the mediation effects on child neurodevelopment of intermittent preventive malaria treatment in pregnant women. Chapter 5 concludes and discusses future work.Item Quantile regression model selection(2014-05) Sherwood, Benjamin StanleyQuantile regression models the conditional quantile of a response variable. Compared to least squares, which focuses on the conditional mean, it provides a more complete picture of the conditional distribution. Median regression, a special case of quantile regression, offers a robust alternative to least squares methods. Common regression assumptions are that there is a linear relationship between the covariates, there is no missing data and the sample size is larger than the number of covariates. In this dissertation we examine how to use quantile regression models when these assumptions do not hold. In all settings we examine the issue of variable selection and present methods that have the property of model selection consistency, that is, if the true model is one the candidate models, then these methods select the true model with probability approaching one as the sample size increases.We consider partial linear models to relax the assumption that there is a linear relationship between the covariates. Partial linear models assume some covariates have a linear relationship with the response while other covariates have an unknown non-linear relationship. These models provide the flexibility of non-parametric methods while having ease of interpretation for the targeted parametric components. Additive partial linear models assume an additive form between the non-linear covariates, which allows for a flexible model that avoids the ``curse of dimensionality". We examine additive partial linear quantile regression models using basis splines to model the non-linear relationships.In practice missing data is a common problem and estimates can be biased if observations with missing data are dropped from the analysis. Imputation is a popular approach to handle missing data, but imputation methods typically require distributional assumptions. An advantage of quantile regression is it does not require any distributional assumptions of the response or the covariates. To remain in a distribution free setting a different approach is needed. We use a weighted objective function that provides more weight to observations that are representative of subjects that are likely to have missing data. This approach is analyzed for both the linear and additive partial linear setting, while considering model selection for the linear covariates. In mean regression analysis, detecting outliers and checking for non-constant variance are standard model-checking steps. With high-dimensional data, checking these conditions becomes increasingly cumbersome. Quantile regression offers an alternative that is robust to outliers in the Y direction and directly models heteroscedastic behavior. Penalized quantile regression is considered to accommodate models where the number of covariates is larger than the sample size. The additive partial linear model is extended to the high-dimensional case. We consider the setting where the number of linear covariates increases with the sample size, but the number of non-linear covariates remains fixed. To create a sparse model we compare the LASSO and SCAD penalties for the linear components.Item The robustness of multilevel multiple imputation for handling missing data in hierarchical linear models(2013-06) Medhanie, Amanuel GebriMissing data often present problems for credible statistical analyses. Luckily there are valid methods for dealing with missing data but the context in which the data are missing can impact the performance of these methods. Relatively little is known about the proper way to handle missing data in multilevel data structures. This study used a Monte Carlo simulation to compare the performance of three missing data methods on multilevel data (multilevel multiple imputation, multiple imputation ignoring the multilevel structure, and listwise deletion). The comparison of these methods was made under conditions known or believed to influence both the performance of missing data methods and multilevel modeling. The results suggest that listwise deletion performs well compared to multilevel multiple imputation but multiple imputation ignoring the multilevel structure performed poorly. The implications of these results for educational research are discussed.Item Statistical methods for multivariate meta-analysis(2018-07) Lian, QinshuAs health problems get more complicated, the medical decisions and policies are rarely determined by evidence on a single effect. In recent years, there is a wide acknowledgment of the drawbacks of using separate univariate meta-analyses to solve a clearly multivariate problem. This has led to increased attention to multivariate meta-analysis, which is a generalization of standard univariate meta-analysis to synthesize evidence on multiple outcomes or treatments. Recently developments in multivariate meta-analysis have been driven by a wide variety of application areas. This thesis focuses on three areas in which multivariate meta-analysis is highly important but is not yet well developed: network meta-analysis of diagnostic tests, meta-analysis of observational studies accounting for exposure misclassification, and meta-regression methods adjusting for post-randomization variables. In studies evaluating the accuracy of diagnostic tests, three designs are commonly used, crossover, randomized, and non-comparative. Existing methods for meta-analysis of diagnostic tests mainly consider simple cases in which the reference test in all or none of the studies can be considered a gold standard test, and in which all studies use either a randomized or non-comparative design. To overcome the limitations of current methods, the Bayesian hierarchical summary receiver operating characteristic model is extended to network meta-analysis of diagnostic tests to simultaneously compare multiple tests within a missing data framework. The method accounts for correlations between multiple tests and for heterogeneity between studies. It also allows different studies to include different subsets of diagnostic tests and provides flexibility in the choice of summary statistics. In observational studies, misclassification of exposure is ubiquitous and can substantially bias the estimated association between an outcome and an exposure. Although misclassification in a single observational study has been well studied, few papers have considered it in a meta-analysis. A novel Bayesian approach is proposed to fill this methodological gap. We simultaneously synthesize two (or more) meta-analyses, with one on the association between a misclassified exposure and an outcome (main studies), and the other on the association between the misclassified exposure and the true exposure (validation studies). We extend the current scope for using external validation data by relaxing the "transportability'' assumption by means of random effects models. The proposed model accounts for heterogeneity between studies and can be extended to allow different studies to have different exposure measurements. Meta-regression is widely used in systematic reviews to investigate sources of heterogeneity and the association of study-level covariates with treatment effectiveness. Although existing meta-regression approaches have been successful in adjusting for baseline covariates, these methods have several limitations in adjusting for post-randomization variables. We propose a joint meta-regression method adjusting for post-randomization variables. The proposed method simultaneously estimates the treatment effect on the primary outcome and on the post-randomization variables. It takes both between- and within-study variability in post-randomization variables into consideration. Missing data is allowed in the primary outcome and the post-randomization variables, and uncertainty in the missing data is taken into consideration. All the proposed models are evaluated in simulation studies and are illustrated using real meta-analytic datasets.