DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks

Zhuang Wang, Zhaozhuo Xu, Xinyu Wu, Anshumali Shrivastava, T. S. Eugene Ng
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23274-23291, 2022.

Abstract

Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52x speedup in total training throughput.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-wang22aj, title = {{DRAGONN}: Distributed Randomized Approximate Gradients of Neural Networks}, author = {Wang, Zhuang and Xu, Zhaozhuo and Wu, Xinyu and Shrivastava, Anshumali and Ng, T. S. Eugene}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {23274--23291}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/wang22aj/wang22aj.pdf}, url = {https://proceedings.mlr.press/v162/wang22aj.html}, abstract = {Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52x speedup in total training throughput.} }
Endnote
%0 Conference Paper %T DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks %A Zhuang Wang %A Zhaozhuo Xu %A Xinyu Wu %A Anshumali Shrivastava %A T. S. Eugene Ng %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-wang22aj %I PMLR %P 23274--23291 %U https://proceedings.mlr.press/v162/wang22aj.html %V 162 %X Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52x speedup in total training throughput.
APA
Wang, Z., Xu, Z., Wu, X., Shrivastava, A. & Ng, T.S.E.. (2022). DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:23274-23291 Available from https://proceedings.mlr.press/v162/wang22aj.html.

Related Material