Open Problem: Tight Convergence of SGD in Constant Dimension

Tomer Koren, Shahar Segal
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:3847-3851, 2020.

Abstract

Stochastic Gradient Descent (SGD) is one of the most popular optimization methods in machine learning and has been studied extensively since the early 50’s. However, our understanding of this fundamental algorithm is still lacking in certain aspects. We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations $T$, and in particular, that the gap is still unaddressed even in the one dimensional case. For the latter, we provide evidence that the correct rate is $\Theta(1/\sqrt{T})$ and conjecture that the same applies in any (constant) dimension.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-koren20a, title = {Open Problem: Tight Convergence of SGD in Constant Dimension}, author = {Koren, Tomer and Segal, Shahar}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {3847--3851}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/koren20a/koren20a.pdf}, url = {https://proceedings.mlr.press/v125/koren20a.html}, abstract = {Stochastic Gradient Descent (SGD) is one of the most popular optimization methods in machine learning and has been studied extensively since the early 50’s. However, our understanding of this fundamental algorithm is still lacking in certain aspects. We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations $T$, and in particular, that the gap is still unaddressed even in the one dimensional case. For the latter, we provide evidence that the correct rate is $\Theta(1/\sqrt{T})$ and conjecture that the same applies in any (constant) dimension.} }
Endnote
%0 Conference Paper %T Open Problem: Tight Convergence of SGD in Constant Dimension %A Tomer Koren %A Shahar Segal %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-koren20a %I PMLR %P 3847--3851 %U https://proceedings.mlr.press/v125/koren20a.html %V 125 %X Stochastic Gradient Descent (SGD) is one of the most popular optimization methods in machine learning and has been studied extensively since the early 50’s. However, our understanding of this fundamental algorithm is still lacking in certain aspects. We point out to a gap that remains between the known upper and lower bounds for the expected suboptimality of the last SGD point whenever the dimension is a constant independent of the number of SGD iterations $T$, and in particular, that the gap is still unaddressed even in the one dimensional case. For the latter, we provide evidence that the correct rate is $\Theta(1/\sqrt{T})$ and conjecture that the same applies in any (constant) dimension.
APA
Koren, T. & Segal, S.. (2020). Open Problem: Tight Convergence of SGD in Constant Dimension. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:3847-3851 Available from https://proceedings.mlr.press/v125/koren20a.html.

Related Material