Escaping Saddles with Stochastic Gradients
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:11631172, 2018.
Abstract
We analyze the variance of stochastic gradients along negative curvature directions in certain nonconvex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that  contrary to the case of isotropic noise  this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally  and under the same condition  we derive the first convergence rate for plain SGD to a secondorder stationary point in a number of iterations that is independent of the problem dimension.
Related Material


