A Double Residual Compression Algorithm for Efficient Distributed Learning

Xiaorui Liu; Yao Li; Jiliang Tang; Ming Yan

A Double Residual Compression Algorithm for Efficient Distributed Learning

Xiaorui Liu, Yao Li, Jiliang Tang, Ming Yan

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:133-143, 2020.

Abstract

Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95%$ of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v108-liu20a,
  title = 	 {A Double Residual Compression Algorithm for Efficient Distributed Learning},
  author =       {Liu, Xiaorui and Li, Yao and Tang, Jiliang and Yan, Ming},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {133--143},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/liu20a/liu20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/liu20a.html},
  abstract = 	 {Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95%$ of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.}
}

Endnote

%0 Conference Paper
%T A Double Residual Compression Algorithm for Efficient Distributed Learning
%A Xiaorui Liu
%A Yao Li
%A Jiliang Tang
%A Ming Yan
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-liu20a
%I PMLR
%P 133--143
%U https://proceedings.mlr.press/v108/liu20a.html
%V 108
%X Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95%$ of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.

APA

Liu, X., Li, Y., Tang, J. & Yan, M.. (2020). A Double Residual Compression Algorithm for Efficient Distributed Learning. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:133-143 Available from https://proceedings.mlr.press/v108/liu20a.html.

A Double Residual Compression Algorithm for Efficient Distributed Learning

Abstract

Cite this Paper

Related Material