Accelerating Proximal Gradient Descent via Silver Stepsizes

Jinho Bok, Jason M. Altschuler
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:421-453, 2025.

Abstract

Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum—just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2024, 2025), namely $O(\varepsilon^{-\log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log 1/\varepsilon)$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing—the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes—with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-bok25a, title = {Accelerating Proximal Gradient Descent via Silver Stepsizes}, author = {Bok, Jinho and Altschuler, Jason M.}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {421--453}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/bok25a/bok25a.pdf}, url = {https://proceedings.mlr.press/v291/bok25a.html}, abstract = {Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum—just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2024, 2025), namely $O(\varepsilon^{-\log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log 1/\varepsilon)$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing—the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes—with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.} }
Endnote
%0 Conference Paper %T Accelerating Proximal Gradient Descent via Silver Stepsizes %A Jinho Bok %A Jason M. Altschuler %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-bok25a %I PMLR %P 421--453 %U https://proceedings.mlr.press/v291/bok25a.html %V 291 %X Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum—just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2024, 2025), namely $O(\varepsilon^{-\log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log 1/\varepsilon)$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing—the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes—with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.
APA
Bok, J. & Altschuler, J.M.. (2025). Accelerating Proximal Gradient Descent via Silver Stepsizes. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:421-453 Available from https://proceedings.mlr.press/v291/bok25a.html.

Related Material