Combining Conjugate Direction Methods with Stochastic Approximation of Gradients

Nicol N. Schraudolph, Thore Graepel
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, PMLR R4:248-253, 2003.

Abstract

The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR4-schraudolph03a, title = {Combining Conjugate Direction Methods with Stochastic Approximation of Gradients}, author = {Schraudolph, Nicol N. and Graepel, Thore}, booktitle = {Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics}, pages = {248--253}, year = {2003}, editor = {Bishop, Christopher M. and Frey, Brendan J.}, volume = {R4}, series = {Proceedings of Machine Learning Research}, month = {03--06 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r4/schraudolph03a/schraudolph03a.pdf}, url = {https://proceedings.mlr.press/r4/schraudolph03a.html}, abstract = {The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.}, note = {Reissued by PMLR on 01 April 2021.} }
Endnote
%0 Conference Paper %T Combining Conjugate Direction Methods with Stochastic Approximation of Gradients %A Nicol N. Schraudolph %A Thore Graepel %B Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2003 %E Christopher M. Bishop %E Brendan J. Frey %F pmlr-vR4-schraudolph03a %I PMLR %P 248--253 %U https://proceedings.mlr.press/r4/schraudolph03a.html %V R4 %X The method of conjugate directions provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent. %Z Reissued by PMLR on 01 April 2021.
APA
Schraudolph, N.N. & Graepel, T.. (2003). Combining Conjugate Direction Methods with Stochastic Approximation of Gradients. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R4:248-253 Available from https://proceedings.mlr.press/r4/schraudolph03a.html. Reissued by PMLR on 01 April 2021.

Related Material