Understanding Adaptivity in Machine Learning Optimization: Theories and Algorithms

Chen, Xiangyi2022-08-292022-08-292022-05https://hdl.handle.net/11299/241368University of Minnesota Ph.D. dissertation. May 2022. Major: Electrical Engineering. Advisor: Mingyi Hong. 1 computer file (PDF); vi, 134 pages.Optimization plays an indispensable role in modern machine learning, due to its necessity in many aspects, especially in model training. Over the past decade, the rapid development of research in deep machine learning models posed many new challenges for machine learning optimization. As a result, designing efficient and robust optimization algorithms remains an active research area within machine learning. In addition, some new and notable optimization algorithms were proposed to tackle the new challenges in model training. An important class of the new algorithms is motivated by the idea of incorporating adaptation into algorithm design, so that the algorithms can adapt to the local geometry of the optimization landscape. However, some of these newly designed algorithms with adaptation are tailored to achieve superior empirical performance for certain classes of optimization problems but are not well understood theoretically. Thus, the performance of these algorithms is less predictable in other domains or applications. In this thesis, we try to build theories for some algorithms with adaptation. In particular, the result of this thesis can be separated into three parts. In the first part, we analyze a class of algorithms with adaptation, which we call Adam-type algorithms, for nonconvex unconstrained optimization. We provide conditions for these algorithms to converge and shed light on design principles for this class of algorithms. In the second part, we extend the previous analysis to zeroth-order constrained/unconstrained optimization and propose an algorithm called ZO-AdaMM, which has superior performance in generating black-box adversarial attacks. In the third part, we study the gradient clipping operation for differentially private SGD. Gradient clipping adds a form of adaptation to SGD that can potentially hurt convergence. We identify regimes where gradient clipping is not an issue and verify the existence of these regimes in practice. Further, we provide a perturbation mechanism to mitigate the adversarial effect caused by gradient clipping.enadaptive methodsmachine learningoptimizationUnderstanding Adaptivity in Machine Learning Optimization: Theories and AlgorithmsThesis or Dissertation