[edit]
First-Order Algorithms Converge Faster than O(1/k) on Convex Problems
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3754-3762, 2019.
Abstract
It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of O(1/k) in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to o(1/k). We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar o(1/k) convergence rates. The result is tight in the sense that a rate of O(1/k1+ϵ) is not generally attainable for any ϵ>0, for any of these methods.