Stochastic Algorithms for Large Scale Machine Learning

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Stochastic Algorithms for Large Scale Machine Learning

Published Date

2020-04

Publisher

Type

Thesis or Dissertation

Abstract

Over the last few years, advances have been made in fields like scalable algorithms, novel learning models, and machine learning software frameworks, which resulted in further breakthroughs in computer vision, natural language processing, and recommendation systems. In entering the era of big data, large scale machine learning became increasingly important in training a big model on a big dataset. However, even with modern distributed systems, the large amount of data used today cannot be fully loaded into memory or manipulated directly. Stochastic optimization algorithms can reduce the size of model and data in learning process, therefore plays a key role in building a large scale machine learning system. Modern machine learning faces two kinds of big data: (1) the number of features is much larger than the sample size, in other words, high-dimension; (2) massive sample size. The two types of data need different treatments. For (1), the challenge lies in finding the intrinsic structure of the model used. Norm regularized estimators like Lasso is widely used in finding a low complex parameter with high statistical accuracy. More complicated structures can be constructed using simple structures like sparsity and low-rank, which are shown to be more robust than single structured models. For (2), carefully designed algorithms that make use of only a small portion of data in each operation are necessary for computation. Stochastic gradient descent (SGD) and its variants are well known for their practical effectiveness. Prior information can be incorporated in algorithms to improve their efficiency while maintaining convergence guarantee. In this thesis, we target a unified understanding of stochastic algorithms in both (1) and (2). For high-dimensional data, we consider estimation from superposition structure and locally low-rank structure. Both structures are constructed from simple structures like sparsity and low-rankness. A parameter is superposition structured if it can be written as a sum of multiple component parameters, each with its own structure. We present a novel estimator that allows the sum of any number of component parameters to be accurately estimated, characterize sample complexity of the estimator, and give high probability non-asymptotic bounds on the componentwise estimation error. A matrix is Locally rank if there are submatrices of the matrix that are low rank. We consider a convex relaxation of LLR structure and propose an efficient algorithm based on dual projected gradient descent (D-PGD) for computing the proximal operator. We show that the proposed D-PGD algorithm has a geometrical convergence rate. We present several practical ways to further speed up the computations. We develop several novel stochastic optimization algorithms for large empirical risk minimization problems. We study sketched iterative algorithms for the generalized linear model, in particular, sketched projected gradient descent (S-PGD), and sketched stochastic variance reduced gradient (S-SVRG) for the structured generalized linear model, and illustrate that these methods have geometric convergence to the statistical error under suitable assumptions. Secondly, we propose Yet Another Adaptive SGD (YAA-SGD) algorithm based on the behavior of expected change in function value in terms of the first and second moments of stochastic gradients. YAA-SGD step sizes depend on the ratio of the first and second moments and do not need to be monotonically decreasing. Further, the algorithm provably converges at a $O(1/\sqrt{T})$ rate for non-convex problems. We also present empirical results supporting the effectiveness of the algorithm.

Keywords

Description

University of Minnesota Ph.D. dissertation. April 2020. Major: Computer Science. Advisor: Arindam Banerjee. 1 computer file (PDF); xi, 162 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Gu, Qilong. (2020). Stochastic Algorithms for Large Scale Machine Learning. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216391.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.