Stochastic Variance Reduction for Nonconvex Optimization

Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:314-323, 2016.

Abstract

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-reddi16, title = {Stochastic Variance Reduction for Nonconvex Optimization}, author = {Reddi, Sashank J. and Hefny, Ahmed and Sra, Suvrit and Poczos, Barnabas and Smola, Alex}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {314--323}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/reddi16.pdf}, url = {https://proceedings.mlr.press/v48/reddi16.html}, abstract = {We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings.} }
Endnote
%0 Conference Paper %T Stochastic Variance Reduction for Nonconvex Optimization %A Sashank J. Reddi %A Ahmed Hefny %A Suvrit Sra %A Barnabas Poczos %A Alex Smola %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-reddi16 %I PMLR %P 314--323 %U https://proceedings.mlr.press/v48/reddi16.html %V 48 %X We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings.
RIS
TY - CPAPER TI - Stochastic Variance Reduction for Nonconvex Optimization AU - Sashank J. Reddi AU - Ahmed Hefny AU - Suvrit Sra AU - Barnabas Poczos AU - Alex Smola BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-reddi16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 314 EP - 323 L1 - http://proceedings.mlr.press/v48/reddi16.pdf UR - https://proceedings.mlr.press/v48/reddi16.html AB - We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. ER -
APA
Reddi, S.J., Hefny, A., Sra, S., Poczos, B. & Smola, A.. (2016). Stochastic Variance Reduction for Nonconvex Optimization. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:314-323 Available from https://proceedings.mlr.press/v48/reddi16.html.

Related Material