The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang, Sebastian U Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4115-4157, 2024.

Abstract

Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-patel24a, title = {The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication}, author = {Patel, Kumar Kshitij and Glasgow, Margalit and Zindari, Ali and Wang, Lingxiao and Stich, Sebastian U and Cheng, Ziheng and Joshi, Nirmit and Srebro, Nathan}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {4115--4157}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/patel24a/patel24a.pdf}, url = {https://proceedings.mlr.press/v247/patel24a.html}, abstract = {Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.} }
Endnote
%0 Conference Paper %T The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication %A Kumar Kshitij Patel %A Margalit Glasgow %A Ali Zindari %A Lingxiao Wang %A Sebastian U Stich %A Ziheng Cheng %A Nirmit Joshi %A Nathan Srebro %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-patel24a %I PMLR %P 4115--4157 %U https://proceedings.mlr.press/v247/patel24a.html %V 247 %X Local SGD is a popular optimization method in distributed learning, often outperforming mini-batch SGD. Despite this practical success, proving the efficiency of local SGD has been difficult, creating a significant gap between theory and practice. We provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing these assumptions can not capture local SGD’s effectiveness. We also demonstrate the min-max optimality of accelerated mini-batch SGD under these assumptions. Our findings emphasize the need for improved modeling of data heterogeneity. Under higher-order assumptions, we provide new upper bounds that verify the dominance of local SGD over mini-batch SGD when data heterogeneity is low.
APA
Patel, K.K., Glasgow, M., Zindari, A., Wang, L., Stich, S.U., Cheng, Z., Joshi, N. & Srebro, N.. (2024). The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4115-4157 Available from https://proceedings.mlr.press/v247/patel24a.html.

Related Material