Universal Gradient Methods for Stochastic Convex Optimization

Anton Rodomanov, Ali Kavis, Yongtao Wu, Kimon Antonakopoulos, Volkan Cevher
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:42620-42646, 2024.

Abstract

We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle’s noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which accumulates gradient norms, our Universal Gradient Method accumulates appropriate combinations of gradientand iterate differences. The resulting algorithm has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class including, in particular, both nonsmooth functions and those with Lipschitz continuous gradient. We also present the Universal Fast Gradient Method for SCO enjoying optimal efficiency estimates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-rodomanov24a, title = {Universal Gradient Methods for Stochastic Convex Optimization}, author = {Rodomanov, Anton and Kavis, Ali and Wu, Yongtao and Antonakopoulos, Kimon and Cevher, Volkan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {42620--42646}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/rodomanov24a/rodomanov24a.pdf}, url = {https://proceedings.mlr.press/v235/rodomanov24a.html}, abstract = {We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle’s noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which accumulates gradient norms, our Universal Gradient Method accumulates appropriate combinations of gradientand iterate differences. The resulting algorithm has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class including, in particular, both nonsmooth functions and those with Lipschitz continuous gradient. We also present the Universal Fast Gradient Method for SCO enjoying optimal efficiency estimates.} }
Endnote
%0 Conference Paper %T Universal Gradient Methods for Stochastic Convex Optimization %A Anton Rodomanov %A Ali Kavis %A Yongtao Wu %A Kimon Antonakopoulos %A Volkan Cevher %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-rodomanov24a %I PMLR %P 42620--42646 %U https://proceedings.mlr.press/v235/rodomanov24a.html %V 235 %X We develop universal gradient methods for Stochastic Convex Optimization (SCO). Our algorithms automatically adapt not only to the oracle’s noise but also to the Hölder smoothness of the objective function without a priori knowledge of the particular setting. The key ingredient is a novel strategy for adjusting step-size coefficients in the Stochastic Gradient Method (SGD). Unlike AdaGrad, which accumulates gradient norms, our Universal Gradient Method accumulates appropriate combinations of gradientand iterate differences. The resulting algorithm has state-of-the-art worst-case convergence rate guarantees for the entire Hölder class including, in particular, both nonsmooth functions and those with Lipschitz continuous gradient. We also present the Universal Fast Gradient Method for SCO enjoying optimal efficiency estimates.
APA
Rodomanov, A., Kavis, A., Wu, Y., Antonakopoulos, K. & Cevher, V.. (2024). Universal Gradient Methods for Stochastic Convex Optimization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:42620-42646 Available from https://proceedings.mlr.press/v235/rodomanov24a.html.

Related Material