Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

Rong Ge, Zhize Li, Weiyao Wang, Xiang Wang
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:1394-1448, 2019.

Abstract

Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective. For nonconvex objectives, these techniques can also find a first-order stationary point (with small gradient). However, in nonconvex optimization it is often crucial to find a second-order stationary point (with small gradient and almost PSD hessian). In this paper, we show that Stabilized SVRG (a simple variant of SVRG) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\epsilon^2+n/\epsilon^{1.5})$ stochastic gradients. To our best knowledge, this is the first second-order guarantee for a simple variant of SVRG. The running time almost matches the known guarantees for finding $\epsilon$-first-order stationary points.

Cite this Paper


BibTeX
@InProceedings{pmlr-v99-ge19a, title = {Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization}, author = {Ge, Rong and Li, Zhize and Wang, Weiyao and Wang, Xiang}, booktitle = {Proceedings of the Thirty-Second Conference on Learning Theory}, pages = {1394--1448}, year = {2019}, editor = {Beygelzimer, Alina and Hsu, Daniel}, volume = {99}, series = {Proceedings of Machine Learning Research}, month = {25--28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v99/ge19a/ge19a.pdf}, url = {https://proceedings.mlr.press/v99/ge19a.html}, abstract = {Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective. For nonconvex objectives, these techniques can also find a first-order stationary point (with small gradient). However, in nonconvex optimization it is often crucial to find a second-order stationary point (with small gradient and almost PSD hessian). In this paper, we show that Stabilized SVRG (a simple variant of SVRG) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\epsilon^2+n/\epsilon^{1.5})$ stochastic gradients. To our best knowledge, this is the first second-order guarantee for a simple variant of SVRG. The running time almost matches the known guarantees for finding $\epsilon$-first-order stationary points.} }
Endnote
%0 Conference Paper %T Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization %A Rong Ge %A Zhize Li %A Weiyao Wang %A Xiang Wang %B Proceedings of the Thirty-Second Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Alina Beygelzimer %E Daniel Hsu %F pmlr-v99-ge19a %I PMLR %P 1394--1448 %U https://proceedings.mlr.press/v99/ge19a.html %V 99 %X Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective. For nonconvex objectives, these techniques can also find a first-order stationary point (with small gradient). However, in nonconvex optimization it is often crucial to find a second-order stationary point (with small gradient and almost PSD hessian). In this paper, we show that Stabilized SVRG (a simple variant of SVRG) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\epsilon^2+n/\epsilon^{1.5})$ stochastic gradients. To our best knowledge, this is the first second-order guarantee for a simple variant of SVRG. The running time almost matches the known guarantees for finding $\epsilon$-first-order stationary points.
APA
Ge, R., Li, Z., Wang, W. & Wang, X.. (2019). Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:1394-1448 Available from https://proceedings.mlr.press/v99/ge19a.html.

Related Material