FetchSGD: Communication-Efficient Federated Learning with Sketching

Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8253-8265, 2020.

Abstract

Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm,called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-rothchild20a, title = {{F}etch{SGD}: Communication-Efficient Federated Learning with Sketching}, author = {Rothchild, Daniel and Panda, Ashwinee and Ullah, Enayat and Ivkin, Nikita and Stoica, Ion and Braverman, Vladimir and Gonzalez, Joseph and Arora, Raman}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8253--8265}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/rothchild20a/rothchild20a.pdf}, url = {https://proceedings.mlr.press/v119/rothchild20a.html}, abstract = {Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm,called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.} }
Endnote
%0 Conference Paper %T FetchSGD: Communication-Efficient Federated Learning with Sketching %A Daniel Rothchild %A Ashwinee Panda %A Enayat Ullah %A Nikita Ivkin %A Ion Stoica %A Vladimir Braverman %A Joseph Gonzalez %A Raman Arora %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-rothchild20a %I PMLR %P 8253--8265 %U https://proceedings.mlr.press/v119/rothchild20a.html %V 119 %X Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm,called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch.This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.
APA
Rothchild, D., Panda, A., Ullah, E., Ivkin, N., Stoica, I., Braverman, V., Gonzalez, J. & Arora, R.. (2020). FetchSGD: Communication-Efficient Federated Learning with Sketching. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8253-8265 Available from https://proceedings.mlr.press/v119/rothchild20a.html.

Related Material