Scalable Optimization Methods for Machine Learning: Structures, Properties and Applications

Tao, Shaozhe2018-08-142018-08-142018-05https://hdl.handle.net/11299/199006University of Minnesota Ph.D. dissertation.May 2018. Major: Industrial and Systems Engineering. Advisors: Shuzhong Zhang, Daniel Boley. 1 computer file (PDF); xii, 173 pages.Many problems in machine learning can be formulated using optimization models with constraints that are well structured. Driven in part by such applications, the need to solve very large scale optimization models is pushing the performance limits on traditional state-of-art methods. In this thesis, we conduct a systematic study on scalable optimization methods. Our investigations mainly cover three aspects: the role of special structures in the models, convergence properties of algorithms, and applications in machine learning. First, we study popular scalable methods on sparse structured models, including alternating direction method of multipliers, coordinate descent method, proximal gradient method and accelerated proximal gradient method. In contrast to many global convergence results in the literature, we are particularly interested in the local convergence behavior. We establish the local bounds on the LASSO model, showing their eventual local linear convergence. We show that all of the methods can be treated as some eigenvalue problems, and therefore a spectral analysis becomes applicable. We also observe that when initiated far from the solution, the spectral analysis implies that one possibly get a sequence of iterates that appears to stagnate, but is actually taking small constant steps toward the solution. Moreover, we illustrate how the unaccelerated proximal gradient method can sometimes be faster when the iterates get close enough to the solution, as compared to the accelerated proximal gradient method. A comprehensive comparison of all methods is presented. Next we move on to group sparse structured model. We develop an inverse covariance estimator that can regularize for overlapping group sparsity, and provide better estimates, especially when the dimension size is much larger than the number of samples. Furthermore, we extend the estimator into a general setting that covers any convex differentiable functions with conic constraints. The general estimator can exploit the domain structure to reduce the computation cost. The designed Frank-Wolfe method can leverage the decomposition within group structure, hence speeding up computation. Simulations and applications using real data justify both stability and scalability of this estimator, as the results show noticeable improvement. Finally, we explore a certain low-rank structure in tensor. We construct the connection between the low-rank property in tensor and the group sparsity in its factor matrices. This provides a way to find a low-rank tensor decomposition via a regularized multiconvex optimization. In our approach, no prior knowledge of tensor rank is assumed. We propose to apply block coordinate descent method, since each block update can be implemented efficiently. Consequently, we show that our approach can be used to solve the tensor low-rank completion problem as well.enMachine LearningOptimizationStructured ModelsScalable Optimization Methods for Machine Learning: Structures, Properties and ApplicationsThesis or Dissertation