Modern Classification with Big Data

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Modern Classification with Big Data

Published Date

2018-07

Publisher

Type

Thesis or Dissertation

Abstract

Rapid advances in information technologies have ushered in the era of "big data" and revolutionized the scientific research across many disciplines, including economics, genomics, neuroscience, and modern commerce. Big data creates golden opportunities but has also arisen unprecedented challenges due to the massive size and complex structure of the data. Among many tasks in statistics and machine learning, classification has diverse applications, ranging from improving daily life to reaching the new frontiers of science and engineering. This thesis will discuss the envisions of broader approaches to modern classification methodologies, as well as computational considerations to cope with the big data challenges. Chapter 2 of the thesis presents a modern classification method named data-driven generalized distance weighted discrimination. A fast algorithm with an emphasis on computational efficiency for big data will be introduced. Our method is formulated in a reproducing kernel Hilbert space, and learning theory of the Bayes risk consistency will be developed. We will use extensive benchmark data applications to demonstrate that the prediction accuracy of our method is highly competitive with state-of-the-art classification methods including support vector machine, random forest, gradient boosting, and deep neural network. Chapter 3 introduces sparse penalized DWD for high-dimensional classification, which is commonly used in the era of big data. We develop a very efficient algorithm to compute the solution path of the sparse DWD at a given fine grid of regularization parameters. Chapter 4 proposes multicategory kernel distance weighted discrimination for multi-class classification. The proposal is defined as a margin-vector optimization problem in a reproducing kernel Hilbert space. This formulation is shown to enjoy Fisher consistency. We develop an accelerated projected gradient descent algorithm to fit multicategory kernel DWD. Chapter 5 develops a magic formula for doing CV in the context of large margin classification. We design a novel and successful algorithm to fit and tune the support vector machine.

Description

University of Minnesota Ph.D. dissertation. July 2018. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); viii, 115 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Wang, Boxiang. (2018). Modern Classification with Big Data. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216325.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.