Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:26215-26240, 2024.

Abstract

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-le-bars24a, title = {Improved Stability and Generalization Guarantees of the Decentralized {SGD} Algorithm}, author = {Le Bars, Batiste and Bellet, Aur\'{e}lien and Tommasi, Marc and Scaman, Kevin and Neglia, Giovanni}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {26215--26240}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/le-bars24a/le-bars24a.pdf}, url = {https://proceedings.mlr.press/v235/le-bars24a.html}, abstract = {This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.} }
Endnote
%0 Conference Paper %T Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm %A Batiste Le Bars %A Aurélien Bellet %A Marc Tommasi %A Kevin Scaman %A Giovanni Neglia %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-le-bars24a %I PMLR %P 26215--26240 %U https://proceedings.mlr.press/v235/le-bars24a.html %V 235 %X This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.
APA
Le Bars, B., Bellet, A., Tommasi, M., Scaman, K. & Neglia, G.. (2024). Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:26215-26240 Available from https://proceedings.mlr.press/v235/le-bars24a.html.

Related Material