Convergence diagnostics for stochastic gradient descent with constant learning rate

Jerry Chee, Panos Toulis
; Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1476-1485, 2018.

Abstract

Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-chee18a, title = {Convergence diagnostics for stochastic gradient descent with constant learning rate}, author = {Jerry Chee and Panos Toulis}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {1476--1485}, year = {2018}, editor = {Amos Storkey and Fernando Perez-Cruz}, volume = {84}, series = {Proceedings of Machine Learning Research}, address = {Playa Blanca, Lanzarote, Canary Islands}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/chee18a/chee18a.pdf}, url = {http://proceedings.mlr.press/v84/chee18a.html}, abstract = {Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.} }
Endnote
%0 Conference Paper %T Convergence diagnostics for stochastic gradient descent with constant learning rate %A Jerry Chee %A Panos Toulis %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-chee18a %I PMLR %J Proceedings of Machine Learning Research %P 1476--1485 %U http://proceedings.mlr.press %V 84 %W PMLR %X Many iterative procedures in stochastic optimization exhibit a transient phase followed by a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in that region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transition in the context of stochastic gradient descent with constant learning rate. We present theory and experiments suggesting that the region where the proposed diagnostic is activated coincides with the convergence region. For a class of loss functions, we derive a closed-form solution describing such region. Finally, we suggest an application to speed up convergence of stochastic gradient descent by halving the learning rate each time stationarity is detected. This leads to a new variant of stochastic gradient descent, which in many settings is comparable to state-of-art.
APA
Chee, J. & Toulis, P.. (2018). Convergence diagnostics for stochastic gradient descent with constant learning rate. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in PMLR 84:1476-1485

Related Material