High Dimensional Statistical Models: Applications to Climate

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

High Dimensional Statistical Models: Applications to Climate

Published Date

2015-09

Publisher

Type

Thesis or Dissertation

Abstract

Recent years have seen enormous growth in collection and curation of datasets in various domains which often involve thousands or even millions of variables. Examples include social networking websites, geophysical sensor networks, cancer genomics, climate science, and many more. In many applications, it is of prime interest to understand the dependencies between variables, such that predictive models may be designed from knowledge of such dependencies. However, traditional statistical methods, such as least squares regression, are often inapplicable for such tasks, since the available sample size is much smaller than problem dimensionality. Therefore we require new models and methods for statistical data analysis which provide provable estimation guarantees even in such high dimensional scenarios. Further, we also require that such models provide efficient implementation and optimization routines. Statistical models which satisfy both these criteria will be important for solving prediction problems in many scientific domains. High dimensional statistical models have attracted interest from both the theoretical and applied machine learning communities in recent years. Of particular interest are parametric models, which considers estimation of coefficient vectors in the scenario where sample size is much smaller than the dimensionality of the problem. Although most existing work focuses on analyzing sparse regression methods using L1 norm regularizers, there exist other ``structured'' norm regularizers that encode more interesting structure in the sparsity induced on the estimated regression coefficients. In the first part of this thesis, we conduct a theoretical study of such structured regression methods. First, we prove statistical consistency of regression with hierarchical tree-structured norm regularizer known as hiLasso. Second, we formulate a generalization of the popular Dantzig Selector for sparse linear regression to any norm regularizer, called Generalized Dantzig Selector, and provide statistical consistency guarantees of estimation. Further, we provide the first known results on non-asymptotic rates of consistency for the recently proposed $k$-support norm regularizer. Finally, we show that in the presence of measurement errors in covariates, the tools we use for proving consistency in the noiseless setting are inadequate in proving statistical consistency. In the second part of the thesis, we consider application of regularized regression methods to statistical modeling problems in climate science. First, we consider application of Sparse Group Lasso, a special case of hiLasso, for predictive modeling of land climate variables from measurements of atmospheric variables over oceans. Extensive experiments illustrate that structured sparse regression provides both better performance and more interpretable models than unregularized regression and even unstructured sparse regression methods. Second, we consider application of regularized regression methods for discovering stable factors for predictive modeling in climate. Specifically, we consider the problem of determining dominant factors influencing winter precipitation over the Great Lakes Region of the US. Using a sparse linear regression method, followed by random permutation tests, we mine stable sets of predictive features from a pool of possible predictors. Some of the stable factors discovered through this process are shown to relate to known physical processes influencing precipitation over Great Lakes.

Description

University of Minnesota Ph.D. dissertation. September 2015. Major: Computer Science. Advisor: Arindam Banerjee. 1 computer file (PDF); ix, 103 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Chatterjee, Soumyadeep. (2015). High Dimensional Statistical Models: Applications to Climate. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/175549.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.