Variable selection in high-dimensional classification

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Variable selection in high-dimensional classification

Published Date

2013-06

Publisher

Type

Thesis or Dissertation

Abstract

Classification has long been an important research topic for statisticians. Nowadays, scientists are further challenged by classification problems for high-dimensional datasets in various fields, ranging from genomics, economics to machine learning. For such massive datasets, classical classification techniques may be inefficient or even infeasible, while new techniques are highly sought-after. My dissertation work tackles high-dimensional classification problems by utilizing variable selection. In particular, three methods are proposed and studied: direct sparse discriminant analysis, semiparametric sparse discriminant analysis and the Kolmogorov filter. In the proposal of direct sparse discriminant analysis (DSDA), I first point out the disadvantage in many current methods that they ignore the correlation structure between predictors. Then DSDA is proposed to extend the well-known linear discriminant analysis to high dimensions, fully respecting the correlation structure. The proposal is efficient and consistent, with excellent numerical performance. In addition to the proposal of DSDA, I also study its connection to many popular proposals of linear discriminant analysis in high dimensions, including the L1-Fisher's discriminant analysis and the sparse optimal scoring. Semiparametric sparse discriminant analysis (SeSDA) extends DSDA by relaxing the normality assumption, which is fundamental for any method requiring the linear discriminant analysis model. SeSDA is more robust than DSDA, while it preserves the good properties of DSDA. Along with the development of SeSDA, a new concentration inequality is obtained that can provide theoretical justifications for methods based on Gaussian copulas. Moreover, the Kolmogorov filter is proposed as a fully nonparametric method that performs variable selection for high-dimensional classification. It requires minimal assumptions on the distribution of the predictors, and is supported by both theoretical and numerical examples. Also, some potential future work is discussed on variable selection in classification.

Description

University of Minnesota Ph.D. dissertation. June 2013. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); ix, 103 pages, appendices A-D.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Mai, Qing. (2013). Variable selection in high-dimensional classification. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/158803.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.