Optimal Complexity in Decentralized Training

Yucheng Lu, Christopher De Sa
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7111-7123, 2021.

Abstract

Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-lu21a, title = {Optimal Complexity in Decentralized Training}, author = {Lu, Yucheng and De Sa, Christopher}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7111--7123}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/lu21a/lu21a.pdf}, url = {https://proceedings.mlr.press/v139/lu21a.html}, abstract = {Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.} }
Endnote
%0 Conference Paper %T Optimal Complexity in Decentralized Training %A Yucheng Lu %A Christopher De Sa %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-lu21a %I PMLR %P 7111--7123 %U https://proceedings.mlr.press/v139/lu21a.html %V 139 %X Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
APA
Lu, Y. & De Sa, C.. (2021). Optimal Complexity in Decentralized Training. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7111-7123 Available from https://proceedings.mlr.press/v139/lu21a.html.

Related Material