Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

Vien V. Mai, Mikael Johansson
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7325-7335, 2021.

Abstract

Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-mai21a, title = {Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness}, author = {Mai, Vien V. and Johansson, Mikael}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7325--7335}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/mai21a/mai21a.pdf}, url = {https://proceedings.mlr.press/v139/mai21a.html}, abstract = {Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.} }
Endnote
%0 Conference Paper %T Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness %A Vien V. Mai %A Mikael Johansson %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-mai21a %I PMLR %P 7325--7335 %U https://proceedings.mlr.press/v139/mai21a.html %V 139 %X Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.
APA
Mai, V.V. & Johansson, M.. (2021). Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7325-7335 Available from https://proceedings.mlr.press/v139/mai21a.html.

Related Material