Model Selection and Estimation for High-dimensional Data Analysis

In the era of big data, uncovering useful information and hidden patterns in the data is prevalent in different fields. However, it is challenging to effectively select input variables in data and estimate their effects. In this thesis, our goal is to de- velop reproducible statistical approaches that provide mechanistic explanations of the phenomenon observed in big data analysis. The thesis contains two parts: variable selection and model estimation. The first part investigates how to measure and inter- pret the usefulness of an input variable using an approach called “variable importance learning” and builds tools (methodology and software) that can be widely applied. We propose two variable importance measures, a parametric measure SOIL and a non- parametric measure CVIL, using the idea of model combining and cross validation respectively. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. The CVIL method possesses desirable theoretical properties and enhance the interpretability of many mysterious but effective machine learning methods. The second part focuses on how to estimate the effect of a useful input variable in the case where interaction of two input variables exists. We investigate the minimax rate of convergence for regres- sion estimation in high-dimensional sparse linear models with two-way interactions, and construct an adaptive estimator that achieves the minimax rate of convergence regardless of the true heredity condition and the sparsity indices.

Description

University of Minnesota Ph.D. dissertation. June 2019. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); viii, 139 pages.

Collections

Dissertations

Suggested citation

Ye, Chenglong. (2019). Model Selection and Estimation for High-dimensional Data Analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/206401.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Model Selection and Estimation for High-dimensional Data Analysis

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Model Selection and Estimation for High-dimensional Data Analysis

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation