Ma, Meng2021-08-162021-08-162021-04https://hdl.handle.net/11299/223149University of Minnesota Ph.D. dissertation. April 2021. Major: Electrical Engineering. Advisor: Georgios Giannakis. 1 computer file (PDF); x, 107 pages.The quick evolution and widespread applicability of machine learning and artificial intelligence have fundamentally shaped and transcended modern life. Three key players stand behind such a ubiquitous emergence: big data, growing computing power, and improved algorithms. The need for distributed storage and processing arises from this ``data deluge'' that can flood any powerful machine, from the dispersively available data that are prohibitively costly to transfer to a central unit for further processing, and from the prevalent Internet-of-Things (IoT) devices that require real-time response as well as respect of privacy. Modern machine learning algorithms built to exploit such huge amounts of data are often computationally ``hungry'' and their ``appetite'' for computing power increases rapidly at a pace unmatched by the development of computing hardware. All these considerations justify the pressing need for distributed optimization algorithms that are scalable yet flexible to adapt to various configurations of networked computing nodes. To cope with these challenges, the present thesis first introduces a novel ADMM based approach (termed hybrid ADMM) for efficient decentralized optimization. By modeling the underlying communication patterns as hypergraphs, it provides a unifying framework that subsumes both centralized and fully decentralized counterparts as special cases, and allows nodes to communicate in centralized and decentralized ways at the same time. Leveraging the expressiveness of hypergraph models, a technique termed ``in-network acceleration'' is introduced enabling ``almost free'' performance gain by exploiting local graph topology. To account for heterogeneity of nodes and edges, a diagonal scaling based approach is proposed to tackle weighted updates, where proper edge weights are identified through solving a preconditioning problem. By assigning larger weights to critical edges, the proposed algorithm achieves higher efficiency and becomes more robust to perturbations. Finally, to boost the efficiency of the whole system to its full potential, an asynchronous method is introduced to mitigate the straggler problem so that nodes with different processing power can run at full speed. Convergence analysis for proposed algorithms is provided which reveals the connection between convergence rate and spectral properties of the communication graph. Numerical tests of several common tasks on different graphs are carried out to demonstrate the effectiveness of proposed algorithms.enDecentralized optimizationDistributed learningHybrid ADMMPreconditioned ADMMEfficient Methods for Decentralized Optimization over GraphsThesis or Dissertation