Moniqua: Modulo Quantized Communication in Decentralized SGD

Yucheng Lu, Christopher De Sa
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:6415-6425, 2020.

Abstract

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing $1$-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-lu20a, title = {Moniqua: Modulo Quantized Communication in Decentralized {SGD}}, author = {Lu, Yucheng and De Sa, Christopher}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {6415--6425}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/lu20a/lu20a.pdf}, url = {https://proceedings.mlr.press/v119/lu20a.html}, abstract = {Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing $1$-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.} }
Endnote
%0 Conference Paper %T Moniqua: Modulo Quantized Communication in Decentralized SGD %A Yucheng Lu %A Christopher De Sa %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-lu20a %I PMLR %P 6415--6425 %U https://proceedings.mlr.press/v119/lu20a.html %V 119 %X Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing $1$-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.
APA
Lu, Y. & De Sa, C.. (2020). Moniqua: Modulo Quantized Communication in Decentralized SGD. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:6415-6425 Available from https://proceedings.mlr.press/v119/lu20a.html.

Related Material