Iterate Averaging as Regularization for Stochastic Gradient Descent

Gergely Neu; Lorenzo Rosasco

Iterate Averaging as Regularization for Stochastic Gradient Descent

Gergely Neu, Lorenzo Rosasco

Proceedings of the 31st Conference On Learning Theory, PMLR 75:3222-3242, 2018.

Abstract

We propose and analyze a variant of the classic Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-neu18a,
  title = 	 {Iterate Averaging as Regularization for Stochastic Gradient Descent},
  author =       {Neu, Gergely and Rosasco, Lorenzo},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {3222--3242},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/neu18a/neu18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/neu18a.html},
  abstract = 	 {We propose and analyze a variant of the classic  Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods.  Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.}
}

Endnote

%0 Conference Paper
%T Iterate Averaging as Regularization for Stochastic Gradient Descent
%A Gergely Neu
%A Lorenzo Rosasco
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-neu18a
%I PMLR
%P 3222--3242
%U https://proceedings.mlr.press/v75/neu18a.html
%V 75
%X We propose and analyze a variant of the classic  Polyak–Ruppert averaging scheme, broadly used in stochastic gradient methods.  Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least-squares regression, we show that this averaging scheme has the same regularizing effect, and indeed is asymptotically equivalent, to ridge regression. In particular, we derive finite-sample bounds for the proposed approach that match the best known results for regularized stochastic gradient methods.

APA


Neu, G. & Rosasco, L.. (2018). Iterate Averaging as Regularization for Stochastic Gradient Descent. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:3222-3242 Available from https://proceedings.mlr.press/v75/neu18a.html.

Related Material

Download PDF