The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Kumar Kshitij Patel; Margalit Glasgow; Ali Zindari; Lingxiao Wang; Sebastian U Stich; Ziheng Cheng; Nirmit Joshi; Nathan Srebro

The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4115-4157, 2024.

Abstract

Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-patel24a,
  title = 	 {The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication},
  author =       {Patel, Kumar Kshitij and Glasgow, Margalit and Zindari, Ali and Wang, Lingxiao and Stich, Sebastian U and Cheng, Ziheng and Joshi, Nirmit and Srebro, Nathan},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {4115--4157},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/patel24a/patel24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/patel24a.html},
  abstract = 	 {Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity.  Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.}
}

Endnote

%0 Conference Paper
%T The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication
%A Kumar Kshitij Patel
%A Margalit Glasgow
%A Ali Zindari
%A Lingxiao Wang
%A Sebastian U Stich
%A Ziheng Cheng
%A Nirmit Joshi
%A Nathan Srebro
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-patel24a
%I PMLR
%P 4115--4157
%U https://proceedings.mlr.press/v247/patel24a.html
%V 247
%X Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity.  Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

APA


Patel, K.K., Glasgow, M., Zindari, A., Wang, L., Stich, S.U., Cheng, Z., Joshi, N. & Srebro, N.. (2024). The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4115-4157 Available from https://proceedings.mlr.press/v247/patel24a.html.

The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Abstract

Cite this Paper

Related Material