Variable selection in high-dimensional classification
2013-06
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Variable selection in high-dimensional classification
Alternative title
Authors
Published Date
2013-06
Publisher
Type
Thesis or Dissertation
Abstract
Classification has long been an important research topic for statisticians. Nowadays, scientists are further challenged by classification problems for high-dimensional datasets in various fields, ranging from genomics, economics to machine learning. For such massive datasets, classical classification techniques may be inefficient or even infeasible, while new techniques are highly sought-after.
My dissertation work tackles high-dimensional classification problems by utilizing variable selection. In particular, three methods are proposed and studied: direct sparse discriminant analysis, semiparametric sparse discriminant analysis and the Kolmogorov filter.
In the proposal of direct sparse discriminant analysis (DSDA), I first point out the disadvantage in many current methods that they ignore the correlation structure between predictors. Then DSDA is proposed to extend the well-known linear discriminant analysis to high dimensions, fully respecting the correlation structure. The proposal is efficient and consistent, with excellent numerical performance. In addition to the proposal of DSDA, I also study its connection to many popular proposals of linear discriminant analysis in high dimensions, including the L1-Fisher's discriminant analysis and the sparse optimal scoring.
Semiparametric sparse discriminant analysis (SeSDA) extends DSDA by relaxing the normality assumption, which is fundamental for any method requiring the linear discriminant analysis model. SeSDA is more robust than DSDA, while it preserves the good properties of DSDA. Along with the development of SeSDA, a new concentration inequality is obtained that can provide theoretical justifications for methods based on Gaussian copulas.
Moreover, the Kolmogorov filter is proposed as a fully nonparametric method that performs variable selection for high-dimensional classification. It requires minimal assumptions on the distribution of the predictors, and is supported by both theoretical and numerical examples. Also, some potential future work is discussed on variable selection in classification.
Description
University of Minnesota Ph.D. dissertation. June 2013. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); ix, 103 pages, appendices A-D.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Mai, Qing. (2013). Variable selection in high-dimensional classification. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/158803.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.