Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, Martin Jaggi
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3252-3261, 2019.

Abstract

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (EF-SGD) with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions. Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-karimireddy19a, title = {Error Feedback Fixes {S}ign{SGD} and other Gradient Compression Schemes}, author = {Karimireddy, Sai Praneeth and Rebjock, Quentin and Stich, Sebastian and Jaggi, Martin}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3252--3261}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/karimireddy19a/karimireddy19a.pdf}, url = {https://proceedings.mlr.press/v97/karimireddy19a.html}, abstract = {Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (EF-SGD) with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions. Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory.} }
Endnote
%0 Conference Paper %T Error Feedback Fixes SignSGD and other Gradient Compression Schemes %A Sai Praneeth Karimireddy %A Quentin Rebjock %A Sebastian Stich %A Martin Jaggi %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-karimireddy19a %I PMLR %P 3252--3261 %U https://proceedings.mlr.press/v97/karimireddy19a.html %V 97 %X Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (EF-SGD) with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions. Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory.
APA
Karimireddy, S.P., Rebjock, Q., Stich, S. & Jaggi, M.. (2019). Error Feedback Fixes SignSGD and other Gradient Compression Schemes. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3252-3261 Available from https://proceedings.mlr.press/v97/karimireddy19a.html.

Related Material