Accelerating Stochastic Gradient Descent for Least Squares Regression

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford
; Proceedings of the 31st Conference On Learning Theory, PMLR 75:545-604, 2018.

Abstract

There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes this conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-jain18a, title = {Accelerating Stochastic Gradient Descent for Least Squares Regression}, author = {Jain, Prateek and Kakade, Sham M. and Kidambi, Rahul and Netrapalli, Praneeth and Sidford, Aaron}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {545--604}, year = {2018}, editor = {Sébastien Bubeck and Vianney Perchet and Philippe Rigollet}, volume = {75}, series = {Proceedings of Machine Learning Research}, address = {}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/jain18a/jain18a.pdf}, url = {http://proceedings.mlr.press/v75/jain18a.html}, abstract = {There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes this conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.} }
Endnote
%0 Conference Paper %T Accelerating Stochastic Gradient Descent for Least Squares Regression %A Prateek Jain %A Sham M. Kakade %A Rahul Kidambi %A Praneeth Netrapalli %A Aaron Sidford %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-jain18a %I PMLR %J Proceedings of Machine Learning Research %P 545--604 %U http://proceedings.mlr.press %V 75 %W PMLR %X There is widespread sentiment that fast gradient methods (e.g. Nesterov’s acceleration, conjugate gradient, heavy ball) are not effective for the purposes of stochastic optimization due to their instability and error accumulation. Numerous works have attempted to quantify these instabilities in the face of either statistical or non-statistical. This work considers these issues for the special case of stochastic approximation for the least squares regression problem, and our main result refutes this conventional wisdom by showing that acceleration can be made robust to statistical errors. In particular, this work introduces an accelerated stochastic gradient method that provably achieves the minimax optimal statistical risk faster than stochastic gradient descent. Critical to the analysis is a sharp characterization of accelerated stochastic gradient descent as a stochastic process. We hope this characterization gives insights towards the broader question of designing simple and effective accelerated stochastic methods for more general convex and non-convex optimization problems.
APA
Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P. & Sidford, A.. (2018). Accelerating Stochastic Gradient Descent for Least Squares Regression. Proceedings of the 31st Conference On Learning Theory, in PMLR 75:545-604

Related Material