The ubiquity of healthcare data allows for complex analyses of a variety of topics ranging from healthcare cost to cognitive decline in dementia patients. Healthcare datasets are often highly skewed and heteroskedastic posing great challenges for statistical analyses. Quantile regression is an effective tool for analyzing healthcare datasets because, compared with mean regression, quantile regression has weaker assumptions which are more appropriate for complex data. Additionally, quantile regression models conditional quantiles of the response variable providing a more complete picture of the conditional distribution. In this dissertation, we propose three solutions to challenges in healthcare data analysis. All three solutions either directly rely on quantile regression or extend existing methodology and algorithms. Motivated by the Medical Expenditure Panel Survey containing data from individuals’ medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient will incur high medical expenditure. The common practice is to artificially dichotomize the response. We propose a new semiparametric prediction rule to classify whether a future response occurs at the upper tail of the response distribution. The new method can be considered a semiparametric estimator of the Bayes rule for classification and enjoys some nice features. It incorporates nonlinear covariate effects and can be adapted to construct a prediction interval and hence provides more information about the future response. Next, we extend semiparametric quantile regression methodology to longitudinal studies with non-ignorable dropout. Dropout occurs when a patient leaves a study prior to its conclusion. Non-ignorable dropout occurs when the probability of dropout depends on the response. Failing to account for non-ignorable dropout can result in biased estimation. To handle dropout, we propose a weighted semiparametric quantile regression estimator where the weights are inversely proportional to the estimated probability remaining in the study. We show that this weighted estimator gives unbiased estimates of linear effects. We illustrate the advantages of the proposed method on a subset of the National Alzheimer’s Coordinating Center Uniform Data Set tracking cognitive decline in dementia patients. Lastly, we turn our attention to the issue of analyzing very large datasets with a large number of covariates and sample size. Penalized quantile regression is often used to simultaneously select variables and estimate effects by fitting models at many values of a tuning parameter. Existing algorithms have focused on improving computation time at one value of a tuning parameter, however obtaining model estimates for all values of the tuning parameter can still be prohibitively time-consuming. Instead of attempting to solve the penalized quantile regression problem for each value of a tuning parameter, we propose a sparsity path algorithm to approximate the solution allowing for fast exploration of candidate models at many different sparsity levels. Simulations show that the true model is always contained in the set of candidate models returned by the proposed sparsity path algorithm.
University of Minnesota Ph.D. dissertation. 2018. Major: Statistics. Advisor: Lan Wang. 1 computer file (PDF); 122 pages.
Semiparametric Quantile Regression and Applications to Healthcare Data Analysis.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.