Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?

Ohad Shamir

Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?

Ohad Shamir

Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:47.1-47.3, 2012.

Abstract

Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.

Cite this Paper

BibTeX


@InProceedings{pmlr-v23-shamir12,
  title = 	 {Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?},
  author = 	 {Shamir, Ohad},
  booktitle = 	 {Proceedings of the 25th Annual Conference on Learning Theory},
  pages = 	 {47.1--47.3},
  year = 	 {2012},
  editor = 	 {Mannor, Shie and Srebro, Nathan and Williamson, Robert C.},
  volume = 	 {23},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {25--27 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v23/shamir12/shamir12.pdf},
  url = 	 {https://proceedings.mlr.press/v23/shamir12.html},
  abstract = 	 {Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.}
}

Endnote

%0 Conference Paper
%T Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
%A Ohad Shamir
%B Proceedings of the 25th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2012
%E Shie Mannor
%E Nathan Srebro
%E Robert C. Williamson	
%F pmlr-v23-shamir12
%I PMLR
%P 47.1--47.3
%U https://proceedings.mlr.press/v23/shamir12.html
%V 23
%X Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.

RIS


TY  - CPAPER
TI  - Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
AU  - Ohad Shamir
BT  - Proceedings of the 25th Annual Conference on Learning Theory
DA  - 2012/06/16
ED  - Shie Mannor
ED  - Nathan Srebro
ED  - Robert C. Williamson	
ID  - pmlr-v23-shamir12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 23
SP  - 47.1
EP  - 47.3
L1  - http://proceedings.mlr.press/v23/shamir12/shamir12.pdf
UR  - https://proceedings.mlr.press/v23/shamir12.html
AB  - Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.
ER  -

APA


Shamir, O.. (2012). Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?. Proceedings of the 25th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 23:47.1-47.3 Available from https://proceedings.mlr.press/v23/shamir12.html.

Related Material

Download PDF