Between Dec 19, 2024 and Jan 2, 2025, datasets can be submitted to DRUM but will not be processed until after the break. Staff will not be available to answer email during this period, and will not be able to provide DOIs until after Jan 2. If you are in need of a DOI during this period, consider Dryad or OpenICPSR. Submission responses to the UDC may also be delayed during this time.
 

On Variance Reduction in Machine Learning

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

On Variance Reduction in Machine Learning

Published Date

2021-04

Publisher

Type

Thesis or Dissertation

Abstract

The quick evolution and widespread applicability of machine learning and artificial intelligence have fundamentally shaped and transcended modern life. Modern machine learning tasks built to learn from data can be typically formulated as empirical risk minimization (ERM) problems. The large amount of data justifies the pressing need for efficient and scalable optimization algorithms that are designed specifically for learning with big data. The main theme of Chapter 2 is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for ERM problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $\epsilon$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+\kappa) \ln (1/\epsilon)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/\epsilon)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems. However, variance reduced algorithms typically require grid search to tune parameters (step size and the number of iterations per inner loop) for optimal performance. To overcome this issue, Chapter 3 introduces `almost tune-free' SVRG and SARAH schemes equipped with i) Barzilai-Borwein (BB) step sizes; ii) averaging; and, iii) the inner loop length adjusted to the BB step sizes. In particular, SVRG, SARAH, and their BB variants are first reexamined through an `estimate sequence' lens to enable new averaging methods that tighten their convergence rates theoretically, and improve their performance empirically when the step size or the inner loop length is chosen large. Then a simple yet effective means to adjust the number of iterations per inner loop is developed to enhance the merits of the proposed averaging schemes and BB step sizes. Numerical tests corroborate the proposed methods. The main goal of Chapter 4 is equipping convex and nonconvex problems with Barzilai-Borwein (BB) step size. With the adaptivity of BB step sizes granted, they can fail when the objective function is not strongly convex. To overcome this challenge, the key idea here is to bridge (non)convex problems and strongly convex ones via regularization. The proposed regularization schemes are \textit{simple} yet effective. Wedding the BB step size with a variance reduction method, e.g., SARAH, offers a free lunch compared with vanilla SARAH in convex problems. The convergence of BB step sizes in nonconvex problems is also established and its complexity is no worse than other adaptive step sizes such as AdaGrad. As a byproduct, our regularized SARAH methods for convex functions ensure that the complexity to find $\mathbb{E}[\| \nabla f(\mathbf{x}) \|^2]\leq \epsilon$ is ${\cal O}\big( (n+\frac{1}{\sqrt{\epsilon}})\ln{\frac{1}{\epsilon}}\big)$, improving $\epsilon$ dependence over existing results. Numerical tests further validate the merits of proposed approaches.

Description

University of Minnesota M.S.E.C.E. thesis. April 2021. Major: Electrical Engineering. Advisor: Georgios Giannakis. 1 computer file (PDF); ix, 89 pages.

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Li, Bingcong. (2021). On Variance Reduction in Machine Learning. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/224528.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.