Decentralized gradient methods: does topology matter?

Giovanni Neglia, Chuan Xu, Don Towsley, Gianmarco Calbi
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2348-2358, 2020.

Abstract

Consensus-based distributed optimization methods have recently been advocated as alternatives to parameter server and ring all-reduce paradigms for large scale training of machine learning models. In this case, each worker maintains a local estimate of the optimal parameter vector and iteratively updates it by averaging the estimates obtained from its neighbors, and applying a correction on the basis of its local dataset. While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-neglia20a, title = {Decentralized gradient methods: does topology matter?}, author = {Neglia, Giovanni and Xu, Chuan and Towsley, Don and Calbi, Gianmarco}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2348--2358}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/neglia20a/neglia20a.pdf}, url = {https://proceedings.mlr.press/v108/neglia20a.html}, abstract = {Consensus-based distributed optimization methods have recently been advocated as alternatives to parameter server and ring all-reduce paradigms for large scale training of machine learning models. In this case, each worker maintains a local estimate of the optimal parameter vector and iteratively updates it by averaging the estimates obtained from its neighbors, and applying a correction on the basis of its local dataset. While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays. } }
Endnote
%0 Conference Paper %T Decentralized gradient methods: does topology matter? %A Giovanni Neglia %A Chuan Xu %A Don Towsley %A Gianmarco Calbi %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-neglia20a %I PMLR %P 2348--2358 %U https://proceedings.mlr.press/v108/neglia20a.html %V 108 %X Consensus-based distributed optimization methods have recently been advocated as alternatives to parameter server and ring all-reduce paradigms for large scale training of machine learning models. In this case, each worker maintains a local estimate of the optimal parameter vector and iteratively updates it by averaging the estimates obtained from its neighbors, and applying a correction on the basis of its local dataset. While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays.
APA
Neglia, G., Xu, C., Towsley, D. & Calbi, G.. (2020). Decentralized gradient methods: does topology matter?. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2348-2358 Available from https://proceedings.mlr.press/v108/neglia20a.html.

Related Material