Adaptive Proximal Gradient Methods Are Universal Without Approximation

Konstantinos Oikonomidis, Emanuel Laude, Puya Latafat, Andreas Themelis, Panagiotis Patrinos
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38663-38682, 2024.

Abstract

We show that adaptive proximal gradient methods for convex problems are not restricted to traditional Lipschitzian assumptions. Our analysis reveals that a class of linesearch-free methods is still convergent under mere local Hölder gradient continuity, covering in particular continuously differentiable semi-algebraic functions. To mitigate the lack of local Lipschitz continuity, popular approaches revolve around $\varepsilon$-oracles and/or linesearch procedures. In contrast, we exploit plain Hölder inequalities not entailing any approximation, all while retaining the linesearch-free nature of adaptive schemes. Furthermore, we prove full sequence convergence without prior knowledge of local Hölder constants nor of the order of Hölder continuity. Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-oikonomidis24a, title = {Adaptive Proximal Gradient Methods Are Universal Without Approximation}, author = {Oikonomidis, Konstantinos and Laude, Emanuel and Latafat, Puya and Themelis, Andreas and Patrinos, Panagiotis}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {38663--38682}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/oikonomidis24a/oikonomidis24a.pdf}, url = {https://proceedings.mlr.press/v235/oikonomidis24a.html}, abstract = {We show that adaptive proximal gradient methods for convex problems are not restricted to traditional Lipschitzian assumptions. Our analysis reveals that a class of linesearch-free methods is still convergent under mere local Hölder gradient continuity, covering in particular continuously differentiable semi-algebraic functions. To mitigate the lack of local Lipschitz continuity, popular approaches revolve around $\varepsilon$-oracles and/or linesearch procedures. In contrast, we exploit plain Hölder inequalities not entailing any approximation, all while retaining the linesearch-free nature of adaptive schemes. Furthermore, we prove full sequence convergence without prior knowledge of local Hölder constants nor of the order of Hölder continuity. Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting.} }
Endnote
%0 Conference Paper %T Adaptive Proximal Gradient Methods Are Universal Without Approximation %A Konstantinos Oikonomidis %A Emanuel Laude %A Puya Latafat %A Andreas Themelis %A Panagiotis Patrinos %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-oikonomidis24a %I PMLR %P 38663--38682 %U https://proceedings.mlr.press/v235/oikonomidis24a.html %V 235 %X We show that adaptive proximal gradient methods for convex problems are not restricted to traditional Lipschitzian assumptions. Our analysis reveals that a class of linesearch-free methods is still convergent under mere local Hölder gradient continuity, covering in particular continuously differentiable semi-algebraic functions. To mitigate the lack of local Lipschitz continuity, popular approaches revolve around $\varepsilon$-oracles and/or linesearch procedures. In contrast, we exploit plain Hölder inequalities not entailing any approximation, all while retaining the linesearch-free nature of adaptive schemes. Furthermore, we prove full sequence convergence without prior knowledge of local Hölder constants nor of the order of Hölder continuity. Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting.
APA
Oikonomidis, K., Laude, E., Latafat, P., Themelis, A. & Patrinos, P.. (2024). Adaptive Proximal Gradient Methods Are Universal Without Approximation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38663-38682 Available from https://proceedings.mlr.press/v235/oikonomidis24a.html.

Related Material