Learning low-rank structure of the data matrix is a powerful method to deal with ``big data''. However, in many modern applications such as recommendation systems and sensor localization, it is impossible or too costly to obtain all data, resulting in a data matrix with most entries missing. A problem of great interest is then to infer the missing data based on very few observations, usually under the assumption that the true data matrix is low rank, which is widely known as the matrix completion problem. The most popular solution for large-scale matrix completion is the matrix factorization (MF) formulation,which has achieved great empirical success. However, due to the non-convexity caused by the factorization model, little is known about whether and when the algorithms for the MF formulation will generate a good solution. In this thesis, we will study the non convex MF formulation for matrix completion, both algorithmically and theoretically. First, we empirically analyze several standard algorithms for the MF formulation. We present a novel finding that the recovery ability of an algorithm mainly depends on whether it can control the row-norms (or incoherence constants) of the iterates. Motivated by this finding, we propose a new formulation that either adds constraints or penalty-function-type regularizers to directly control the row-norms. Simulation results show that the algorithms for the new formulation can recover the matrix in the regime very close to the fundamental limit, outperforming the standard algorithms. We then establish a theoretical guarantee for the new MF formulation to correctly recover the underlying low-rank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of the new MF formulation, and recover the true low-rank matrix. This result is rather surprising since we prove convergence to global optima for a nonconvex problem. To the best of our knowledge, our result is the first one that provides exact recovery guarantee for many standard algorithms. A major difference of our work from the existing results is that we do not need resampling (i.e., using independent samples at each iteration) in either the algorithm or its analysis. Technically, we develop a novel perturbation analysis for matrix factorization which may be of independent interest.