A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian Stich
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:5381-5393, 2020.

Abstract

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-koloskova20a, title = {A Unified Theory of Decentralized {SGD} with Changing Topology and Local Updates}, author = {Koloskova, Anastasia and Loizou, Nicolas and Boreiri, Sadra and Jaggi, Martin and Stich, Sebastian}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {5381--5393}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/koloskova20a/koloskova20a.pdf}, url = {https://proceedings.mlr.press/v119/koloskova20a.html}, abstract = {Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).} }
Endnote
%0 Conference Paper %T A Unified Theory of Decentralized SGD with Changing Topology and Local Updates %A Anastasia Koloskova %A Nicolas Loizou %A Sadra Boreiri %A Martin Jaggi %A Sebastian Stich %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-koloskova20a %I PMLR %P 5381--5393 %U https://proceedings.mlr.press/v119/koloskova20a.html %V 119 %X Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).
APA
Koloskova, A., Loizou, N., Boreiri, S., Jaggi, M. & Stich, S.. (2020). A Unified Theory of Decentralized SGD with Changing Topology and Local Updates. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:5381-5393 Available from https://proceedings.mlr.press/v119/koloskova20a.html.

Related Material