Diagnostics, Cooperation, and Model Selection for Modern Machine Learning

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Diagnostics, Cooperation, and Model Selection for Modern Machine Learning

Alternative title

Published Date

2023-05

Publisher

Type

Thesis or Dissertation

Abstract

Rapid developments in data collection, modeling, and computation tools have offered unlimited opportunities for problem-solving through data-driven modeling. However, data can be very complicated, noisy, or manipulated, which bring many difficulties to modeling in machine learning. To address this issue, we develop tools to handle the challenges in three fundamental and interconnected aspects of machine learning: diagnostics, improvement, and model selection.For “diagnostics,” we focus on classification problems. To our best knowledge, no existing method can assess the goodness-of-fit of general classification procedures, including Random Forest, Boosting, and neural networks. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a model-free goodness-of-fit assessment tool called BAGofT based on data-splitting. For “improvement,” we focus on model training with external assistance. The advancements in data collection methods have brought opportunities for model im- improvement from an external party with different but related datasets. Nevertheless, communication between different parties can be costly and restricted due to the large data size, bandwidth limitation, and privacy regulations. To facilitate modeling in this scenario, we develop a decentralized learning framework called additive-effect assisted learning. For “selection,” we focus on model selection problems where the distribution of the training data may be different from the one we want to evaluate. To address this issue, we develop a model selection method named targeted CV, with a problem-specific weighting function.

Keywords

Description

University of Minnesota Ph.D. dissertation. May 2023. Major: Statistics. Advisors: Yuhong Yang, Jie Ding. 1 computer file (PDF); x, 202 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Zhang, Jiawei. (2023). Diagnostics, Cooperation, and Model Selection for Modern Machine Learning. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/258701.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.