Iterate Averaging as Regularization for Stochastic Gradient Descent

Gergely Neu, Lorenzo Rosasco
Proceedings of the 31st Conference On Learning Theory, PMLR 75:3222-3242, 2018.

Abstract

We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-neu18a, title = {Iterate Averaging as Regularization for Stochastic Gradient Descent}, author = {Neu, Gergely and Rosasco, Lorenzo}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {3222--3242}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/neu18a/neu18a.pdf}, url = {https://proceedings.mlr.press/v75/neu18a.html}, abstract = {We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.} }
Endnote
%0 Conference Paper %T Iterate Averaging as Regularization for Stochastic Gradient Descent %A Gergely Neu %A Lorenzo Rosasco %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-neu18a %I PMLR %P 3222--3242 %U https://proceedings.mlr.press/v75/neu18a.html %V 75 %X We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.
APA
Neu, G. & Rosasco, L.. (2018). Iterate Averaging as Regularization for Stochastic Gradient Descent. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:3222-3242 Available from https://proceedings.mlr.press/v75/neu18a.html.

Related Material