Advancing Climate Science with Machine Learning

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Advancing Climate Science with Machine Learning

Published Date




Thesis or Dissertation


Climate change is considered one of the greatest challenges for humanity in the twenty-first century. The changing climate affects almost every aspect of people's lives, including but not limited to water, energy, agriculture, ecosystems, economics, safety, and health. In the past decades, due to climate change, extreme events, such as wildfire, droughts, and flooding, have become more frequent and intensive, which can cause devastating economic loss and humanitarian crises. Therefore, skillful climate modeling, which can improve the understanding and predictability of climate behavior, would have immense societal values. In climate science, climate models are used for representing the major climate system components (atmosphere, land, ocean, and sea ice) and their interactions. A climate model consists of mathematical equations derived using fundamental laws of physics, which need to be solved using powerful supercomputers. In general, climate models are an important tool for understanding climate change, and continually become more complete and accurate. Nevertheless, the Earth's climate system is too complex to be fully simulated. The state-of-the-art climate models are not yet perfect for fulfilling all needs in understanding and forecasting climate behaviors, which leaves open opportunities for interdisciplinary climate studies. In the past decades, machine learning (ML), especially deep learning, has achieved remarkable strides in wide-ranging applications. The emergence of climate data with high spatiotemporal resolution also makes it possible to tackle complex climate problems using machine learning techniques. Recent studies have shown the effectiveness of machine learning approaches on various tasks, including weather prediction, climate forecasting, weather extremes detection, etc. The dissertation explores how machine learning techniques can make advances in solving two fundamental problems in climate science. The first type of problem is on understanding the dependencies among or within key components in the climate system. Two types of machine learning models are proposed for addressing the problem, that are high-dimensional structure learning model and regularized regression model. We first propose a novel high-dimensional structure learning algorithm for estimating the underlying dependency structure (interactions) among different spatial locations around the globe in global atmospheric circulation. Secondly, to obtain a better understanding of the predictive relationships between land and ocean climate variables, we introduce a weighted Lasso model for land temperature prediction using sea surface temperatures and establish the finite sample estimation error bounds for the proposed model. The second climate problem that we target is sub-seasonal forecasting (SSF), the prediction of key climate variables, e.g., temperature and precipitation, on a 2-week to 2-month horizon. We investigate 10 machine learning models on SSF over the contiguous United States (U.S.). The experimental results indicate that suitable ML models can outperform commonly-used climate baselines and, to some extent, capture the predictability on sub-seasonal time scales. In addition, we perform a fine-grained comparison of a suite of modern ML models with state-of-the-art physics-based dynamical models for SSF over the western U.S. We carefully analyze the strengths of both types of models and propose to incorporate dynamical model forecasts in machine learning modeling, which significantly enhances the forecasting performance of the ML models. Further, to compensate for the limited availability of climate data for SSF, we work on generating synthetic climate data and propose a novel Vision Transformer-based variational autoencoder (ViT-VAE) model. We compare the proposed model with another dominant type of generative model, and show both models are able to generate realistic synthetic samples that match the underlying ground truth distribution closely.


University of Minnesota Ph.D. dissertation. 2022. Major: Computer Science. Advisor: Arindam Banerjee. 1 computer file (PDF); 145 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

He, Sijie. (2022). Advancing Climate Science with Machine Learning. Retrieved from the University Digital Conservancy,

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.