Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions

Tao Sun, Qingsong Wang, Dongsheng Li, Bao Wang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33077-33099, 2023.

Abstract

Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model’s weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-sun23l, title = {Momentum Ensures Convergence of {SIGNSGD} under Weaker Assumptions}, author = {Sun, Tao and Wang, Qingsong and Li, Dongsheng and Wang, Bao}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {33077--33099}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/sun23l/sun23l.pdf}, url = {https://proceedings.mlr.press/v202/sun23l.html}, abstract = {Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model’s weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.} }
Endnote
%0 Conference Paper %T Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions %A Tao Sun %A Qingsong Wang %A Dongsheng Li %A Bao Wang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-sun23l %I PMLR %P 33077--33099 %U https://proceedings.mlr.press/v202/sun23l.html %V 202 %X Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model’s weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.
APA
Sun, T., Wang, Q., Li, D. & Wang, B.. (2023). Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:33077-33099 Available from https://proceedings.mlr.press/v202/sun23l.html.

Related Material