No more pesky learning rates

Tom Schaul; Sixin Zhang; Yann LeCun

No more pesky learning rates

Tom Schaul, Sixin Zhang, Yann LeCun

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):343-351, 2013.

Abstract

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-schaul13,
  title = 	 {No more pesky learning rates},
  author = 	 {Schaul, Tom and Zhang, Sixin and LeCun, Yann},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {343--351},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/schaul13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/schaul13.html},
  abstract = 	 {The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.}
}

Endnote

%0 Conference Paper
%T No more pesky learning rates
%A Tom Schaul
%A Sixin Zhang
%A Yann LeCun
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-schaul13
%I PMLR
%P 343--351
%U https://proceedings.mlr.press/v28/schaul13.html
%V 28
%N 3
%X The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

RIS

TY  - CPAPER
TI  - No more pesky learning rates
AU  - Tom Schaul
AU  - Sixin Zhang
AU  - Yann LeCun
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-schaul13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 343
EP  - 351
L1  - http://proceedings.mlr.press/v28/schaul13.pdf
UR  - https://proceedings.mlr.press/v28/schaul13.html
AB  - The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of the best settings obtained through systematic search, and effectively removes the need for learning rate tuning.
ER  -

APA

Schaul, T., Zhang, S. & LeCun, Y.. (2013). No more pesky learning rates. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):343-351 Available from https://proceedings.mlr.press/v28/schaul13.html.

No more pesky learning rates

Abstract

Cite this Paper

Related Material