A Robust Control Perspective on Optimization of Strongly-Convex Functions

Hu, Bin2016-09-192016-09-192016-07https://hdl.handle.net/11299/182283University of Minnesota Ph.D. dissertation. July 2016. Major: Aerospace Engineering and Mechanics. Advisor: Peter Seiler. 1 computer file (PDF); viii, 128 pages.Large-scale optimization is a central topic in big data science. First-order black-box optimization methods have been widely applied in machine learning problems, since the oracle complexity of these methods can be independent of the parameter dimension. In this dissertation, we formulate linear matrix inequality (LMI) conditions to analyze the convergence rates of various deterministic and stochastic optimization methods. We derive these LMIs using integral quadratic constraints (IQCs) and dissipation inequalities. The first part of this dissertation analyzes deterministic first-order methods (gradient descent, Nesterov's method, etc) as generalized eigenvalue problems (GEVPs). A standard dissipation inequality requires a non-negative definite storage function and ``hard'' IQCs which must hold over all finite time horizons. We develop a modified dissipation inequality that requires neither non-negative definite storage functions nor hard IQCs. Then we show that linear rate analysis of a given deterministic first-order method is equivalent to uniform stability analysis of a related scaled system. This enables derivation of linear rate analysis conditions using standard IQCs for a scaled operator. A soft Zames-Falb IQC is derived and used in the modified dissipation inequality, leading to a GEVP formulation for linear rate analysis of first-order optimization methods. In the second part of this dissertation, we extend the IQC framework to analyze stochastic optimization methods which have been widely applied in empirical risk minimization and machine learning problems. We first combine jump system theory with IQCs to derive LMI conditions for rate analysis of the stochastic average gradient (SAG) method and its variants (SAGA, etc). The resultant LMI conditions can be used to analyze the convergence rates of SAG, SAGA, and other related variants with uniform or non-uniform sampling strategies. Then we develop LMI conditions to analyze the stochastic gradient (SG) method and its variants. The SG method with a constant stepsize typically achieves a linear convergence rate only up to some fixed tolerance. We develop stochastically averaged quadratic constraints with disturbance terms quantifying the inaccuracy of the SG method. Several known results about the SG method have been recovered using our proposed LMI conditions. We also obtain new results regarding the convergence of the SG method under different conditions.enIntegral Quadratic ConstraintsLarge-Scale OptimizationMachine LearningRobust ControlStochastic Average GradientStochastic GradientA Robust Control Perspective on Optimization of Strongly-Convex FunctionsThesis or Dissertation