No more pesky learning rates

Tom Schaul, Sixin Zhang, Yann LeCun
; Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):343-351, 2013.

Abstract

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-schaul13, title = {No more pesky learning rates}, author = {Tom Schaul and Sixin Zhang and Yann LeCun}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {343--351}, year = {2013}, editor = {Sanjoy Dasgupta and David McAllester}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/schaul13.pdf}, url = {http://proceedings.mlr.press/v28/schaul13.html}, abstract = {The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.} }
Endnote
%0 Conference Paper %T No more pesky learning rates %A Tom Schaul %A Sixin Zhang %A Yann LeCun %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-schaul13 %I PMLR %J Proceedings of Machine Learning Research %P 343--351 %U http://proceedings.mlr.press %V 28 %N 3 %W PMLR %X The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.
RIS
TY - CPAPER TI - No more pesky learning rates AU - Tom Schaul AU - Sixin Zhang AU - Yann LeCun BT - Proceedings of the 30th International Conference on Machine Learning PY - 2013/02/13 DA - 2013/02/13 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-schaul13 PB - PMLR SP - 343 DP - PMLR EP - 351 L1 - http://proceedings.mlr.press/v28/schaul13.pdf UR - http://proceedings.mlr.press/v28/schaul13.html AB - The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning. ER -
APA
Schaul, T., Zhang, S. & LeCun, Y.. (2013). No more pesky learning rates. Proceedings of the 30th International Conference on Machine Learning, in PMLR 28(3):343-351

Related Material