Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?

Ohad Shamir
Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:47.1-47.3, 2012.

Abstract

Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v23-shamir12, title = {Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?}, author = {Shamir, Ohad}, booktitle = {Proceedings of the 25th Annual Conference on Learning Theory}, pages = {47.1--47.3}, year = {2012}, editor = {Mannor, Shie and Srebro, Nathan and Williamson, Robert C.}, volume = {23}, series = {Proceedings of Machine Learning Research}, address = {Edinburgh, Scotland}, month = {25--27 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v23/shamir12/shamir12.pdf}, url = {https://proceedings.mlr.press/v23/shamir12.html}, abstract = {Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.} }
Endnote
%0 Conference Paper %T Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent? %A Ohad Shamir %B Proceedings of the 25th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2012 %E Shie Mannor %E Nathan Srebro %E Robert C. Williamson %F pmlr-v23-shamir12 %I PMLR %P 47.1--47.3 %U https://proceedings.mlr.press/v23/shamir12.html %V 23 %X Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.
RIS
TY - CPAPER TI - Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent? AU - Ohad Shamir BT - Proceedings of the 25th Annual Conference on Learning Theory DA - 2012/06/16 ED - Shie Mannor ED - Nathan Srebro ED - Robert C. Williamson ID - pmlr-v23-shamir12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 23 SP - 47.1 EP - 47.3 L1 - http://proceedings.mlr.press/v23/shamir12/shamir12.pdf UR - https://proceedings.mlr.press/v23/shamir12.html AB - Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates. ER -
APA
Shamir, O.. (2012). Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?. Proceedings of the 25th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 23:47.1-47.3 Available from https://proceedings.mlr.press/v23/shamir12.html.

Related Material