From Averaging to Acceleration, There is Only a Step-size

Nicolas Flammarion, Francis Bach
Proceedings of The 28th Conference on Learning Theory, PMLR 40:658-695, 2015.

Abstract

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Flammarion15, title = {From Averaging to Acceleration, There is Only a Step-size}, author = {Flammarion, Nicolas and Bach, Francis}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {658--695}, year = {2015}, editor = {Grünwald, Peter and Hazan, Elad and Kale, Satyen}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Flammarion15.pdf}, url = {https://proceedings.mlr.press/v40/Flammarion15.html}, abstract = {We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.} }
Endnote
%0 Conference Paper %T From Averaging to Acceleration, There is Only a Step-size %A Nicolas Flammarion %A Francis Bach %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Flammarion15 %I PMLR %P 658--695 %U https://proceedings.mlr.press/v40/Flammarion15.html %V 40 %X We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.
RIS
TY - CPAPER TI - From Averaging to Acceleration, There is Only a Step-size AU - Nicolas Flammarion AU - Francis Bach BT - Proceedings of The 28th Conference on Learning Theory DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Flammarion15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 40 SP - 658 EP - 695 L1 - http://proceedings.mlr.press/v40/Flammarion15.pdf UR - https://proceedings.mlr.press/v40/Flammarion15.html AB - We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration. ER -
APA
Flammarion, N. & Bach, F.. (2015). From Averaging to Acceleration, There is Only a Step-size. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:658-695 Available from https://proceedings.mlr.press/v40/Flammarion15.html.

Related Material