Escaping Saddles with Stochastic Gradients

Hadi Daneshmand; Jonas Kohler; Aurelien Lucchi; Thomas Hofmann

Escaping Saddles with Stochastic Gradients

Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1155-1164, 2018.

Abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-daneshmand18a,
  title = 	 {Escaping Saddles with Stochastic Gradients},
  author =       {Daneshmand, Hadi and Kohler, Jonas and Lucchi, Aurelien and Hofmann, Thomas},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {1155--1164},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/daneshmand18a/daneshmand18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/daneshmand18a.html},
  abstract = 	 {We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.}
}

Endnote

%0 Conference Paper
%T Escaping Saddles with Stochastic Gradients
%A Hadi Daneshmand
%A Jonas Kohler
%A Aurelien Lucchi
%A Thomas Hofmann
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-daneshmand18a
%I PMLR
%P 1155--1164
%U https://proceedings.mlr.press/v80/daneshmand18a.html
%V 80
%X We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

APA


Daneshmand, H., Kohler, J., Lucchi, A. & Hofmann, T.. (2018). Escaping Saddles with Stochastic Gradients. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1155-1164 Available from https://proceedings.mlr.press/v80/daneshmand18a.html.

Escaping Saddles with Stochastic Gradients

Abstract

Cite this Paper

Related Material