Browsing by Author "Chen, Xiangyi"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms(2022-03) Chen, XiangyiRecently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent examples include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, which we show can provably close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these median-based algorithms, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.Item Understanding Adaptivity in Machine Learning Optimization: Theories and Algorithms(2022-05) Chen, XiangyiOptimization plays an indispensable role in modern machine learning, due to its necessity in many aspects, especially in model training. Over the past decade, the rapid development of research in deep machine learning models posed many new challenges for machine learning optimization. As a result, designing efficient and robust optimization algorithms remains an active research area within machine learning. In addition, some new and notable optimization algorithms were proposed to tackle the new challenges in model training. An important class of the new algorithms is motivated by the idea of incorporating adaptation into algorithm design, so that the algorithms can adapt to the local geometry of the optimization landscape. However, some of these newly designed algorithms with adaptation are tailored to achieve superior empirical performance for certain classes of optimization problems but are not well understood theoretically. Thus, the performance of these algorithms is less predictable in other domains or applications. In this thesis, we try to build theories for some algorithms with adaptation. In particular, the result of this thesis can be separated into three parts. In the first part, we analyze a class of algorithms with adaptation, which we call Adam-type algorithms, for nonconvex unconstrained optimization. We provide conditions for these algorithms to converge and shed light on design principles for this class of algorithms. In the second part, we extend the previous analysis to zeroth-order constrained/unconstrained optimization and propose an algorithm called ZO-AdaMM, which has superior performance in generating black-box adversarial attacks. In the third part, we study the gradient clipping operation for differentially private SGD. Gradient clipping adds a form of adaptation to SGD that can potentially hurt convergence. We identify regimes where gradient clipping is not an issue and verify the existence of these regimes in practice. Further, we provide a perturbation mechanism to mitigate the adversarial effect caused by gradient clipping.