Over the last several decades, tremendous tools have been developed in machine learning, ranging from statistical models to scalable algorithms, from learning strategies to various tasks, having a far-reaching influence in broad applications ranging from image and speech recogni- tions to recommender systems, and from bioinformatics to robotics. In entering the era of big data, large scale machine learning tools become increasingly important in training a big model on big data. Since machine learning problems are fundamentally empirical risk minimization problems, large scale optimization plays a key role in building a large scale machine learning system. However, scaling optimization algorithms like stochastic gradient descent (SGD) in a distributed system raises some issues like synchronization since they were not designed for this purpose. Synchronization is required because consistency should be guaranteed, i.e., the parameters in different machines should be the same. Synchronization leads to blocking com- putation and performance degradation of a distributed system. Without blocking, overwriting may happen and consistency can not be guaranteed. Moreover, SGD may not be suitable for constrained optimization problems.To address the issues of scaling optimization algorithms, we develop several novel opti- mization algorithms suitable for distributed systems from two settings, i.e., unconstrained op- timization and equality-constrained optimization. First, building on SGD in the unconstrained optimization setting, we propose online randomized block coordinate descent which randomly updates some parameters using some samples and thus allows the overwriting in SGD. Second, instead of striving to maintain consistency at each iteration in the unconstrained optimization setting, we turn to the equality-constrained optimization which guarantees eventual consistency , i.e., the parameters in different machines are not the same at each iteration but will be the same eventually. The equality-constrained optimization also includes the cases that SGD can not be applied.The alternating direction method of multipliers (ADMM) provides a suitable framework for equality-constrained optimziation but raises some issues: (1) it does not provide a systematic way to solve subproblems; (2) it requires to solve all subproblems and synchronization; (3) it is a batch method which can not process data online. For the first issue, we propose Bregman ADMM which provides a unified framework to solve subproblems efficiently. For the second issue, we propose parallel direction method of multipliers (PDMM), which randomly picks some subproblems to solve and does asynchronous aggregation. Finally, we introduce online ADMM so that the algorithm can process partial data at each iteration.To validate the effectiveness and scalability of the proposed algorithms, we particularly apply them to a variety of applications, including sparse structure learning and maximum a posterior (MAP) inference in probabilistic graphical models, and online dictionary learning. We also implement the proposed methods on various architectures, including hundreds to thousands CPU cores in clusters and GPUs. Experimental results show that the proposed methods can scale gracefully with the number of cores and perform better than state-of-the-art methods.