Escaping Saddles with Stochastic Gradients

Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1155-1164, 2018.

Abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-daneshmand18a, title = {Escaping Saddles with Stochastic Gradients}, author = {Daneshmand, Hadi and Kohler, Jonas and Lucchi, Aurelien and Hofmann, Thomas}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1155--1164}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/daneshmand18a/daneshmand18a.pdf}, url = {https://proceedings.mlr.press/v80/daneshmand18a.html}, abstract = {We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.} }
Endnote
%0 Conference Paper %T Escaping Saddles with Stochastic Gradients %A Hadi Daneshmand %A Jonas Kohler %A Aurelien Lucchi %A Thomas Hofmann %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-daneshmand18a %I PMLR %P 1155--1164 %U https://proceedings.mlr.press/v80/daneshmand18a.html %V 80 %X We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.
APA
Daneshmand, H., Kohler, J., Lucchi, A. & Hofmann, T.. (2018). Escaping Saddles with Stochastic Gradients. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1155-1164 Available from https://proceedings.mlr.press/v80/daneshmand18a.html.

Related Material