Model Selection and Estimation for High-dimensional Data Analysis

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Model Selection and Estimation for High-dimensional Data Analysis

Published Date




Thesis or Dissertation


In the era of big data, uncovering useful information and hidden patterns in the data is prevalent in different fields. However, it is challenging to effectively select input variables in data and estimate their effects. In this thesis, our goal is to de- velop reproducible statistical approaches that provide mechanistic explanations of the phenomenon observed in big data analysis. The thesis contains two parts: variable selection and model estimation. The first part investigates how to measure and inter- pret the usefulness of an input variable using an approach called “variable importance learning” and builds tools (methodology and software) that can be widely applied. We propose two variable importance measures, a parametric measure SOIL and a non- parametric measure CVIL, using the idea of model combining and cross validation respectively. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. The CVIL method possesses desirable theoretical properties and enhance the interpretability of many mysterious but effective machine learning methods. The second part focuses on how to estimate the effect of a useful input variable in the case where interaction of two input variables exists. We investigate the minimax rate of convergence for regres- sion estimation in high-dimensional sparse linear models with two-way interactions, and construct an adaptive estimator that achieves the minimax rate of convergence regardless of the true heredity condition and the sparsity indices.



University of Minnesota Ph.D. dissertation. June 2019. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); viii, 139 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Ye, Chenglong. (2019). Model Selection and Estimation for High-dimensional Data Analysis. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.