Browsing by Subject "Training deep networks"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Noisy Iterative Learning Algorithms: Optimization, Generalization, and Privacy(2022-03) Zhou, YingxueDeep neural networks have made remarkable success in a wide range of applications, including image recognition, medical imaging, and language processing applications. This broad success comes from a confluence of several factors, including the collection of massive labeled datasets and advanced network models. However, the massive progress in collecting a large amount of data and designing complex network structures also creates new challenges in training deep neural networks. In principle, the challenges are three-fold. First, the training process of neural networks can be viewed as solving an optimization problem with loss function to be mathematically non-convex and contains local minima, saddle points, and high-dimensional, which makes the training process difficult. Second, many applications involve training data with highly sensitive information about individuals. The rich representation of deep neural networks can potentially reveal fine details of private data. Third, the theoretical understanding of the generalization performance of deep networks is limited. Gradient-based methods have been popular to perform optimization and train neural networks. One promising direction towards tackling the above challenges is the noisy, iterative algorithm which adds extra noise in updating the parameters. The noise has illustrated advantages in escaping saddle points, boosting generalization for deep neural networks, and preserving privacy from the perspective of differential privacy. However, recent advances demonstrate several contradictory roles of noise in dealing with the aforementioned challenges. In particular, the model trained with more injected noise preserves more privacy and yields lower generalization error, i.e., the gap between training loss and population loss. However, such a model demonstrates undesired higher optimization error, i.e., the training loss. The ultimate goal of a learning algorithm is to obtain models that generalize well to the unseen instance and achieve small population loss, the upper bound of which is often comprised of both generalization error and optimization error. In this thesis, we dive deep into the impact of noise from the perspective of optimization, generalization, and privacy. We develop advanced noisy iterative learning algorithms that enable the trained deep learning models to generalize well to the unseen instance and preserve the privacy of the training data. We also theoretically establish generalization error and optimization error bounds of the developed algorithms and provide rigorous privacy guarantees. Succinctly, we made the following contributions. From the privacy and generalization perspective, we design stable adaptive gradient descent algorithms by leveraging the fact that differential privacy implies generalization. We prove that the proposed algorithms can achieve tight population risk bounds for both convex and non-convex objective functions. Later, we derive stability-based generalization bound for a broad family of noisy, iterative algorithms. We develop novel proof techniques that unify two lines of literature: privacy and stochastic gradient Langevin dynamics. To tackle the optimization and privacy trade-off in training deep networks, we develop a modular private optimization algorithm that can substantially improve the utility while preserving privacy. The developed algorithm applies special noise structures based on the special geometry of the gradient space in deep learning. From the generalization and optimization perspective, we propose a novel noisy truncated stochastic gradient descent algorithm for escaping saddle points and boosting generalization in deep learning. Our theoretical analysis illustrates the trade-off between optimization and generalization by several important factors such as learning rate and noise variance. These contributions can lead to new directions and provide insights for deep learning practitioners to develop advanced models.