Tighter Theory for Local SGD on Identical and Heterogeneous Data

Ahmed Khaled; Konstantin Mishchenko; Peter Richtarik

Tighter Theory for Local SGD on Identical and Heterogeneous Data

Ahmed Khaled, Konstantin Mishchenko, Peter Richtarik

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4519-4529, 2020.

Abstract

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-bayoumi20a,
  title = 	 {Tighter Theory for Local SGD on Identical and Heterogeneous Data},
  author =       {Khaled, Ahmed and Mishchenko, Konstantin and Richtarik, Peter},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4519--4529},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/bayoumi20a/bayoumi20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/bayoumi20a.html},
  abstract = 	 {We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data  heterogeneity on the performance of local SGD.}
}

Endnote

%0 Conference Paper
%T Tighter Theory for Local SGD on Identical and Heterogeneous Data
%A Ahmed Khaled
%A Konstantin Mishchenko
%A Peter Richtarik
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-bayoumi20a
%I PMLR
%P 4519--4529
%U https://proceedings.mlr.press/v108/bayoumi20a.html
%V 108
%X We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data  heterogeneity on the performance of local SGD.

APA


Khaled, A., Mishchenko, K. & Richtarik, P.. (2020). Tighter Theory for Local SGD on Identical and Heterogeneous Data. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4519-4529 Available from https://proceedings.mlr.press/v108/bayoumi20a.html.

Tighter Theory for Local SGD on Identical and Heterogeneous Data

Abstract

Cite this Paper

Related Material