First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems

Ching-Pei Lee, Stephen Wright
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3754-3762, 2019.

Abstract

It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar $o(1/k)$ convergence rates. The result is tight in the sense that a rate of $O(1/k^{1+\epsilon})$ is not generally attainable for any $\epsilon>0$, for any of these methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-lee19e, title = {First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems}, author = {Lee, Ching-Pei and Wright, Stephen}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3754--3762}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/lee19e/lee19e.pdf}, url = {https://proceedings.mlr.press/v97/lee19e.html}, abstract = {It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar $o(1/k)$ convergence rates. The result is tight in the sense that a rate of $O(1/k^{1+\epsilon})$ is not generally attainable for any $\epsilon>0$, for any of these methods.} }
Endnote
%0 Conference Paper %T First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems %A Ching-Pei Lee %A Stephen Wright %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-lee19e %I PMLR %P 3754--3762 %U https://proceedings.mlr.press/v97/lee19e.html %V 97 %X It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar $o(1/k)$ convergence rates. The result is tight in the sense that a rate of $O(1/k^{1+\epsilon})$ is not generally attainable for any $\epsilon>0$, for any of these methods.
APA
Lee, C. & Wright, S.. (2019). First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3754-3762 Available from https://proceedings.mlr.press/v97/lee19e.html.

Related Material