Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Zixiang Chen; Dongruo Zhou; Quanquan Gu

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Zixiang Chen, Dongruo Zhou, Quanquan Gu

Proceedings of The 33rd International Conference on Algorithmic Learning Theory, PMLR 167:176-204, 2022.

Abstract

Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

Cite this Paper

BibTeX


@InProceedings{pmlr-v167-chen22b,
  title = 	 {Faster Perturbed Stochastic Gradient Methods for Finding Local Minima},
  author =       {Chen, Zixiang and Zhou, Dongruo and Gu, Quanquan},
  booktitle = 	 {Proceedings of The 33rd International Conference on Algorithmic Learning Theory},
  pages = 	 {176--204},
  year = 	 {2022},
  editor = 	 {Dasgupta, Sanjoy and Haghtalab, Nika},
  volume = 	 {167},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Mar--01 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v167/chen22b/chen22b.pdf},
  url = 	 {https://proceedings.mlr.press/v167/chen22b.html},
  abstract = 	 {Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.}
}

Endnote

%0 Conference Paper
%T Faster Perturbed Stochastic Gradient Methods for Finding Local Minima
%A Zixiang Chen
%A Dongruo Zhou
%A Quanquan Gu
%B Proceedings of The 33rd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2022
%E Sanjoy Dasgupta
%E Nika Haghtalab	
%F pmlr-v167-chen22b
%I PMLR
%P 176--204
%U https://proceedings.mlr.press/v167/chen22b.html
%V 167
%X Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.

APA


Chen, Z., Zhou, D. & Gu, Q.. (2022). Faster Perturbed Stochastic Gradient Methods for Finding Local Minima. Proceedings of The 33rd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 167:176-204 Available from https://proceedings.mlr.press/v167/chen22b.html.

Related Material

Download PDF