Model Selection and Estimation for High-dimensional Data Analysis
2019-06
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Model Selection and Estimation for High-dimensional Data Analysis
Authors
Published Date
2019-06
Publisher
Type
Thesis or Dissertation
Abstract
In the era of big data, uncovering useful information and hidden patterns in the data is prevalent in different fields. However, it is challenging to effectively select input variables in data and estimate their effects. In this thesis, our goal is to de- velop reproducible statistical approaches that provide mechanistic explanations of the phenomenon observed in big data analysis. The thesis contains two parts: variable selection and model estimation. The first part investigates how to measure and inter- pret the usefulness of an input variable using an approach called “variable importance learning” and builds tools (methodology and software) that can be widely applied. We propose two variable importance measures, a parametric measure SOIL and a non- parametric measure CVIL, using the idea of model combining and cross validation respectively. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. The CVIL method possesses desirable theoretical properties and enhance the interpretability of many mysterious but effective machine learning methods. The second part focuses on how to estimate the effect of a useful input variable in the case where interaction of two input variables exists. We investigate the minimax rate of convergence for regres- sion estimation in high-dimensional sparse linear models with two-way interactions, and construct an adaptive estimator that achieves the minimax rate of convergence regardless of the true heredity condition and the sparsity indices.
Keywords
Description
University of Minnesota Ph.D. dissertation. June 2019. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); viii, 139 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Ye, Chenglong. (2019). Model Selection and Estimation for High-dimensional Data Analysis. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/206401.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.