EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression

Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11761-11807, 2023.

Abstract

In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-gruntkowska23a, title = {{EF}21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression}, author = {Gruntkowska, Kaja and Tyurin, Alexander and Richt\'{a}rik, Peter}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {11761--11807}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/gruntkowska23a/gruntkowska23a.pdf}, url = {https://proceedings.mlr.press/v202/gruntkowska23a.html}, abstract = {In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.} }
Endnote
%0 Conference Paper %T EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression %A Kaja Gruntkowska %A Alexander Tyurin %A Peter Richtárik %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-gruntkowska23a %I PMLR %P 11761--11807 %U https://proceedings.mlr.press/v202/gruntkowska23a.html %V 202 %X In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.
APA
Gruntkowska, K., Tyurin, A. & Richtárik, P.. (2023). EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:11761-11807 Available from https://proceedings.mlr.press/v202/gruntkowska23a.html.

Related Material