He, Sijie2022-06-082022-06-082022-03https://hdl.handle.net/11299/227920University of Minnesota Ph.D. dissertation. 2022. Major: Computer Science. Advisor: Arindam Banerjee. 1 computer file (PDF); 145 pages.Climate change is considered one of the greatest challenges for humanity in the twenty-first century. The changing climate affects almost every aspect of people's lives, including but not limited to water, energy, agriculture, ecosystems, economics, safety, and health. In the past decades, due to climate change, extreme events, such as wildfire, droughts, and flooding, have become more frequent and intensive, which can cause devastating economic loss and humanitarian crises. Therefore, skillful climate modeling, which can improve the understanding and predictability of climate behavior, would have immense societal values. In climate science, climate models are used for representing the major climate system components (atmosphere, land, ocean, and sea ice) and their interactions. A climate model consists of mathematical equations derived using fundamental laws of physics, which need to be solved using powerful supercomputers. In general, climate models are an important tool for understanding climate change, and continually become more complete and accurate. Nevertheless, the Earth's climate system is too complex to be fully simulated. The state-of-the-art climate models are not yet perfect for fulfilling all needs in understanding and forecasting climate behaviors, which leaves open opportunities for interdisciplinary climate studies. In the past decades, machine learning (ML), especially deep learning, has achieved remarkable strides in wide-ranging applications. The emergence of climate data with high spatiotemporal resolution also makes it possible to tackle complex climate problems using machine learning techniques. Recent studies have shown the effectiveness of machine learning approaches on various tasks, including weather prediction, climate forecasting, weather extremes detection, etc. The dissertation explores how machine learning techniques can make advances in solving two fundamental problems in climate science. The first type of problem is on understanding the dependencies among or within key components in the climate system. Two types of machine learning models are proposed for addressing the problem, that are high-dimensional structure learning model and regularized regression model. We first propose a novel high-dimensional structure learning algorithm for estimating the underlying dependency structure (interactions) among different spatial locations around the globe in global atmospheric circulation. Secondly, to obtain a better understanding of the predictive relationships between land and ocean climate variables, we introduce a weighted Lasso model for land temperature prediction using sea surface temperatures and establish the finite sample estimation error bounds for the proposed model. The second climate problem that we target is sub-seasonal forecasting (SSF), the prediction of key climate variables, e.g., temperature and precipitation, on a 2-week to 2-month horizon. We investigate 10 machine learning models on SSF over the contiguous United States (U.S.). The experimental results indicate that suitable ML models can outperform commonly-used climate baselines and, to some extent, capture the predictability on sub-seasonal time scales. In addition, we perform a fine-grained comparison of a suite of modern ML models with state-of-the-art physics-based dynamical models for SSF over the western U.S. We carefully analyze the strengths of both types of models and propose to incorporate dynamical model forecasts in machine learning modeling, which significantly enhances the forecasting performance of the ML models. Further, to compensate for the limited availability of climate data for SSF, we work on generating synthetic climate data and propose a novel Vision Transformer-based variational autoencoder (ViT-VAE) model. We compare the proposed model with another dominant type of generative model, and show both models are able to generate realistic synthetic samples that match the underlying ground truth distribution closely.enHASH(0x4061e60)Advancing Climate Science with Machine LearningThesis or Dissertation