How to Escape Saddle Points Efficiently
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:17241732, 2017.
Abstract
This paper shows that a perturbed form of gradient descent converges to a secondorder stationary point in a number iterations which depends only polylogarithmically on dimension (i.e., it is almost “dimensionfree”). The convergence rate of this procedure matches the wellknown convergence rate of gradient descent to firstorder stationary points, up to log factors. When all saddle points are nondegenerate, all secondorder stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the nonconvex optimization community.
Related Material


