Zhang, Jiawei2023-11-282023-11-282023-05https://hdl.handle.net/11299/258701University of Minnesota Ph.D. dissertation. May 2023. Major: Statistics. Advisors: Yuhong Yang, Jie Ding. 1 computer file (PDF); x, 202 pages.Rapid developments in data collection, modeling, and computation tools have offered unlimited opportunities for problem-solving through data-driven modeling. However, data can be very complicated, noisy, or manipulated, which bring many difficulties to modeling in machine learning. To address this issue, we develop tools to handle the challenges in three fundamental and interconnected aspects of machine learning: diagnostics, improvement, and model selection.For “diagnostics,” we focus on classification problems. To our best knowledge, no existing method can assess the goodness-of-fit of general classification procedures, including Random Forest, Boosting, and neural networks. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a model-free goodness-of-fit assessment tool called BAGofT based on data-splitting. For “improvement,” we focus on model training with external assistance. The advancements in data collection methods have brought opportunities for model im- improvement from an external party with different but related datasets. Nevertheless, communication between different parties can be costly and restricted due to the large data size, bandwidth limitation, and privacy regulations. To facilitate modeling in this scenario, we develop a decentralized learning framework called additive-effect assisted learning. For “selection,” we focus on model selection problems where the distribution of the training data may be different from the one we want to evaluate. To address this issue, we develop a model selection method named targeted CV, with a problem-specific weighting function.enDiagnostics, Cooperation, and Model Selection for Modern Machine LearningThesis or Dissertation