DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Lisang Ding; Kexin Jin; Bicheng Ying; Kun Yuan; Wotao Yin

DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:8067-8089, 2023.

Abstract

Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an

$\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD’s convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-ding23b,
  title = 	 {{DSGD}-{CECA}: Decentralized {SGD} with Communication-Optimal Exact Consensus Algorithm},
  author =       {Ding, Lisang and Jin, Kexin and Ying, Bicheng and Yuan, Kun and Yin, Wotao},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {8067--8089},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/ding23b/ding23b.pdf},
  url = 	 {https://proceedings.mlr.press/v202/ding23b.html},
  abstract = 	 {Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD’s convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.}
}

Endnote

%0 Conference Paper
%T DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm
%A Lisang Ding
%A Kexin Jin
%A Bicheng Ying
%A Kun Yuan
%A Wotao Yin
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-ding23b
%I PMLR
%P 8067--8089
%U https://proceedings.mlr.press/v202/ding23b.html
%V 202
%X Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD’s convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.

APA


Ding, L., Jin, K., Ying, B., Yuan, K. & Yin, W.. (2023). DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:8067-8089 Available from https://proceedings.mlr.press/v202/ding23b.html.

DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm

Abstract

Cite this Paper

Related Material