High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:15951-16070, 2024.

Abstract

High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented naively, clipping can spoil the convergence of the popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of any noise. Due to this reason, many works on high-probability analysis consider only unconstrained non-distributed problems, and the existing results for composite/distributed problems do not include some important special cases (like strongly convex problems) and are not optimal. To address this issue, we propose new stochastic methods for composite and distributed optimization based on the clipping of stochastic gradient differences and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. In addition, we also develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gorbunov24a, title = {High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise}, author = {Gorbunov, Eduard and Sadiev, Abdurakhmon and Danilova, Marina and Horv\'{a}th, Samuel and Gidel, Gauthier and Dvurechensky, Pavel and Gasnikov, Alexander and Richt\'{a}rik, Peter}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {15951--16070}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gorbunov24a/gorbunov24a.pdf}, url = {https://proceedings.mlr.press/v235/gorbunov24a.html}, abstract = {High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented naively, clipping can spoil the convergence of the popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of any noise. Due to this reason, many works on high-probability analysis consider only unconstrained non-distributed problems, and the existing results for composite/distributed problems do not include some important special cases (like strongly convex problems) and are not optimal. To address this issue, we propose new stochastic methods for composite and distributed optimization based on the clipping of stochastic gradient differences and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. In addition, we also develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.} }
Endnote
%0 Conference Paper %T High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise %A Eduard Gorbunov %A Abdurakhmon Sadiev %A Marina Danilova %A Samuel Horváth %A Gauthier Gidel %A Pavel Dvurechensky %A Alexander Gasnikov %A Peter Richtárik %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gorbunov24a %I PMLR %P 15951--16070 %U https://proceedings.mlr.press/v235/gorbunov24a.html %V 235 %X High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented naively, clipping can spoil the convergence of the popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of any noise. Due to this reason, many works on high-probability analysis consider only unconstrained non-distributed problems, and the existing results for composite/distributed problems do not include some important special cases (like strongly convex problems) and are not optimal. To address this issue, we propose new stochastic methods for composite and distributed optimization based on the clipping of stochastic gradient differences and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. In addition, we also develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.
APA
Gorbunov, E., Sadiev, A., Danilova, M., Horváth, S., Gidel, G., Dvurechensky, P., Gasnikov, A. & Richtárik, P.. (2024). High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:15951-16070 Available from https://proceedings.mlr.press/v235/gorbunov24a.html.

Related Material