Browsing by Subject "Model selection"

Now showing 1 - 6 of 6

Estimation of conditional average treatment effects
(2014-07) Rolling, Craig Anthony
Researchers often believe that a treatment's effect on a response may be heterogeneous with respect to certain baseline covariates. This is an important premise of personalized medicine and direct marketing. Within a given set of regression models or machine learning algorithms, those that best estimate the regression function may not be best for estimating the effect of a treatment; therefore, there is a need for methods of model selection targeted to treatment effect estimation. In this thesis, we demonstrate an application of the focused information criterion (FIC) for model selection in this setting and develop a treatment effect cross-validation (TECV) aimed at minimizing treatment effect estimation errors. Theoretically, TECV possesses a model selection consistency property when the data splitting ratio is properly chosen. Practically, TECV has the flexibility to compare different types of models and estimation procedures.In the usual regression settings, it is well established that model averaging (or more generally, model combining) frequently produces substantial performance gains over selecting a single model, and the same is true for the goal of treatment effect estimation. We develop a model combination method (TEEM) that properly weights each model based on its (estimated) accuracy for estimating treatment effects. When the baseline covariate is one-dimensional, the TEEM algorithm automatically produces a treatment effect estimate that converges at almost the same rate as the best model in a candidate set.We illustrate the methods of FIC, TECV, and TEEM with simulation studies, data from a clinical trial comparing treatments of patients with HIV, and a benchmark public policy dataset from a work skills training program. The examples show that the methods developed in this thesis often exhibit good performance for the important goal of estimating treatment effects conditional on covariates.
Image modeling and enhancement via structured sparse model selection
(University of Minnesota. Institute for Mathematics and Its Applications, 2010-01) Yu, Guoshen; Sapiro, Guillermo; Mallat, Stéphane
An Inferential Perspective on Data Depth
(2017-05) Majumdar, Subhabrata
Data depth provides a plausible extension of robust univariate quantities like ranks, order statistics and quantiles in multivariate setup. Although depth has gained visibility and has seen many applications in recent years, especially in classification problems for multivariate and functional data, its generalizability and utility in achieving traditional parametric inferential goals is largely unexplored. In this thesis we develop several approaches to address this. In particular, firstly we define an evaluation map function that is more general than data depth, and establish several results in a parametric modelling context using a broad definition of a statistical model. A fast algorithm for covariate selection using data depths as evaluation functions arises as a special case of this. We demonstrate applications of this framework on data from diverse fields: namely climate science, medical imaging and behavioral genetics. Secondly we propose a multivariate rank transformation using data depth and use them for robust inference in location and scale problems in elliptical distributions. Thirdly, we lay out a depth-based regularization framework in multi-response regression, and derive a new method of nonconvex penalized sparse regression in the multitask situation. Across the thesis, several simulation studies and real data examples demonstrate the effectiveness of the methods developed here.
Minimax estimation and model identification for high-dimensional regression.
(2012-08) Wang, Zhan
This dissertation consists of two parts. In Part I, adaptive minimax estimation over sparse `q-hulls is studied. Given a dictionary of Mn initial estimates of the unknown regression function, we aim to construct linearly aggregated estimators that target the best performance among all linear combinations under a sparse q-norm (0 <_ q <_ 1) constraint. Besides identifying the optimal rates of aggregation for these `q-aggregation problems, our multi-directional (or adaptive) strategies by model mixing or model selection achieve the optimal rates simultaneously over the full range of 0 <_ q <_ 1 for general Mn and upper bound tn of the q-norm. Both random and fixed designs, with known or unknown error variance, are handled, and the `q-aggregations examined in this work cover major types of aggregation problems previously studied in the literature. Consequences on minimax-rate adaptive regression under `q- constrained coefficients are also provided. In Part II, the relationship between consistency and minimax-rate optimality in possibly high-dimensional regression estimation is investigated. In model selection where the true model is fixed, it is now well-known that if a model selection method is consistent, it cannot be minimax-rate optimal at the same time. We investigate this con ict in a high-dimensional regression setting where the true model is a changing target, and show that consistency and minimaxrate optimality may co-exist in a single model selection method. Our results provide a comprehensive guideline for characteristics of a model selection method which can be consistent and minimax-rate optimality at the same time.
Quantile regression model selection
(2014-05) Sherwood, Benjamin Stanley
Quantile regression models the conditional quantile of a response variable. Compared to least squares, which focuses on the conditional mean, it provides a more complete picture of the conditional distribution. Median regression, a special case of quantile regression, offers a robust alternative to least squares methods. Common regression assumptions are that there is a linear relationship between the covariates, there is no missing data and the sample size is larger than the number of covariates. In this dissertation we examine how to use quantile regression models when these assumptions do not hold. In all settings we examine the issue of variable selection and present methods that have the property of model selection consistency, that is, if the true model is one the candidate models, then these methods select the true model with probability approaching one as the sample size increases.We consider partial linear models to relax the assumption that there is a linear relationship between the covariates. Partial linear models assume some covariates have a linear relationship with the response while other covariates have an unknown non-linear relationship. These models provide the flexibility of non-parametric methods while having ease of interpretation for the targeted parametric components. Additive partial linear models assume an additive form between the non-linear covariates, which allows for a flexible model that avoids the ``curse of dimensionality". We examine additive partial linear quantile regression models using basis splines to model the non-linear relationships.In practice missing data is a common problem and estimates can be biased if observations with missing data are dropped from the analysis. Imputation is a popular approach to handle missing data, but imputation methods typically require distributional assumptions. An advantage of quantile regression is it does not require any distributional assumptions of the response or the covariates. To remain in a distribution free setting a different approach is needed. We use a weighted objective function that provides more weight to observations that are representative of subjects that are likely to have missing data. This approach is analyzed for both the linear and additive partial linear setting, while considering model selection for the linear covariates. In mean regression analysis, detecting outliers and checking for non-constant variance are standard model-checking steps. With high-dimensional data, checking these conditions becomes increasingly cumbersome. Quantile regression offers an alternative that is robust to outliers in the Y direction and directly models heteroscedastic behavior. Penalized quantile regression is considered to accommodate models where the number of covariates is larger than the sample size. The additive partial linear model is extended to the high-dimensional case. We consider the setting where the number of linear covariates increases with the sample size, but the number of non-linear covariates remains fixed. To create a sparse model we compare the LASSO and SCAD penalties for the linear components.
Robust combinations of statistical procedures.
(2010-11) Wei, Xiaoqiao
Alternative to model selection, model combination gives a combined result from the individual candidate models to share their strengths. Yang (2001, 2004) proposed square-loss-based combining methods for regression analysis and forecast combinations. In this work, we propose robust combinations of statistical procedures. The theoretical properties of the robust combination methods are obtained, which show that the combined procedure automatically performs as well as the best one among the candidate models in estimation or prediction. Systematic simulations and data examples show that the robust methods outperform the square-loss-based combining methods when outliers are likely to occur and perform similarly to them when there are no outliers.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Model selection"