Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Yujia Wang; Lu Lin; Jinghui Chen

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Yujia Wang, Lu Lin, Jinghui Chen

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:6292-6320, 2022.

Abstract

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-wang22e,
  title = 	 { Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization },
  author =       {Wang, Yujia and Lin, Lu and Chen, Jinghui},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {6292--6320},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/wang22e/wang22e.pdf},
  url = 	 {https://proceedings.mlr.press/v151/wang22e.html},
  abstract = 	 { Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory. }
}

Endnote

%0 Conference Paper
%T  Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization 
%A Yujia Wang
%A Lu Lin
%A Jinghui Chen
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-wang22e
%I PMLR
%P 6292--6320
%U https://proceedings.mlr.press/v151/wang22e.html
%V 151
%X  Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory.

APA


Wang, Y., Lin, L. & Chen, J.. (2022).  Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:6292-6320 Available from https://proceedings.mlr.press/v151/wang22e.html.

Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization

Abstract

Cite this Paper

Related Material