Local SGD: Unified Theory and New Efficient Methods

Eduard Gorbunov, Filip Hanzely, Peter Richtarik
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3556-3564, 2021.

Abstract

We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-gorbunov21a, title = { Local SGD: Unified Theory and New Efficient Methods }, author = {Gorbunov, Eduard and Hanzely, Filip and Richtarik, Peter}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {3556--3564}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/gorbunov21a/gorbunov21a.pdf}, url = {http://proceedings.mlr.press/v130/gorbunov21a.html}, abstract = { We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions. } }
Endnote
%0 Conference Paper %T Local SGD: Unified Theory and New Efficient Methods %A Eduard Gorbunov %A Filip Hanzely %A Peter Richtarik %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-gorbunov21a %I PMLR %P 3556--3564 %U http://proceedings.mlr.press/v130/gorbunov21a.html %V 130 %X We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
APA
Gorbunov, E., Hanzely, F. & Richtarik, P.. (2021). Local SGD: Unified Theory and New Efficient Methods . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3556-3564 Available from http://proceedings.mlr.press/v130/gorbunov21a.html.

Related Material