Gao, Xiang2018-08-142018-08-142018-05https://hdl.handle.net/11299/199083University of Minnesota Ph.D. dissertation. May 2018. Major: Industrial Engineering. Advisor: Shuzhong Zhang. 1 computer file (PDF); ix, 208 pages.Efficiency and scalability have become the new norms to evaluate optimization algorithms in the modern era of big data analytics. Despite its superior local convergence property, second or higher-order methods are often disadvantaged when dealing with large-scale problems arising from machine learning. The reason for this is that the second or higher-order methods require the amount of information, or to compute relevant quantities (e.g. Newton's direction), which is exceedingly large. Hence, they are not scalable, at least not in a naive way. Because of exactly the same reason, with substantially lower computational overhead per iteration, lower-order (first-order and zeroth-order) methods have received much attention and become popular in recent years. In this thesis, we present a systematic study of the lower-order algorithms for solving a wide range of different optimization models. As a starting point, the alternating direction method of multipliers (ADMM) will be studied and shown to be an efficient approach for solving large-scale separable optimization with linear constraint. However, the ADMM is originally designed for solving two-block optimization models and its subproblems are not always easy to solve. There are two possible ways to increase the scope of application for the ADMM: (1) to simplify its subroutines so as to fit a broader scheme of lower-order algorithms; (2) to extend it to solve a more general framework of multi-block problems. Depending on the informational structure of the underlying problem, we develop a suite of first-order and zeroth-order variants of the ADMM, where the trade-offs between the required information and the computational complexity are explicitly given. The new variants allow the method to be applicable to a much broader class of problems where only noisy estimations of the gradient or the function values are accessible. Moreover, we extend the ADMM framework to a general multi-block convex optimization model with coupled objective function and linear constraints. Based on a linearization scheme to decouple the objective function, several deterministic first-order algorithms have been developed for both two-block and multi-block problems. We show that, under suitable conditions, the sublinear convergence rate can be established for those methods. It is well known that the original ADMM may fail to converge when the number of blocks exceeds two. To overcome this difficulty, we propose a randomized primal-dual proximal block coordinate updating framework which includes several existing ADMM-type algorithms as special cases. Our result shows that if an appropriate randomization procedure is used, then a sublinear rate of convergence in expectation can be guaranteed for multi-block ADMM, without assuming strong convexity or any additional conditions. The new approach is also extended to solve problems where only a stochastic approximation of the (sub-)gradient of the objective is available. Furthermore, we study various zeroth-order algorithms for both black-box optimizations and online learning problems. In particular, for the black-box optimization, we consider three different settings: (1) the stochastic programming with the restriction that only one random sample can be drawn at any given decision point; (2) a general nonconvex optimization framework with what we call the weakly pseudo-convex property; (3) an estimation of objective value with controllable noise is available. We further extend the idea to the stochastic bandit online learning problem, where the nonsmoothness of the loss function and the one random sample scheme are discussed.enLow-Order Optimization Algorithms: Iteration Complexity and ApplicationsThesis or Dissertation