Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?

[edit]

Ohad Shamir ;
Proceedings of the 25th Annual Conference on Learning Theory, PMLR 23:47.1-47.3, 2012.

Abstract

Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standard analysis in the strongly convex case requires averaging, it was recently shown that this actually degrades the convergence rate, and a better rate is obtainable by averaging just a suffix of the iterates. The question we pose is whether averaging is needed at all to get optimal rates.

Related Material