Tighter Theory for Local SGD on Identical and Heterogeneous Data

Ahmed Khaled, Konstantin Mishchenko, Peter Richtarik
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4519-4529, 2020.

Abstract

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-bayoumi20a, title = {Tighter Theory for Local SGD on Identical and Heterogeneous Data}, author = {Khaled, Ahmed and Mishchenko, Konstantin and Richtarik, Peter}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4519--4529}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/bayoumi20a/bayoumi20a.pdf}, url = { http://proceedings.mlr.press/v108/bayoumi20a.html }, abstract = {We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.} }
Endnote
%0 Conference Paper %T Tighter Theory for Local SGD on Identical and Heterogeneous Data %A Ahmed Khaled %A Konstantin Mishchenko %A Peter Richtarik %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-bayoumi20a %I PMLR %P 4519--4529 %U http://proceedings.mlr.press/v108/bayoumi20a.html %V 108 %X We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
APA
Khaled, A., Mishchenko, K. & Richtarik, P.. (2020). Tighter Theory for Local SGD on Identical and Heterogeneous Data. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4519-4529 Available from http://proceedings.mlr.press/v108/bayoumi20a.html .

Related Material