Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness

Shuang Zeng, Yunwen Lei
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74098-74132, 2025.

Abstract

Decentralized SGD (D-SGD) is a popular optimization method to train large-scale machine learning models. In this paper, we study the generalization behavior of D-SGD for both smooth and nonsmooth problems by leveraging the algorithm stability. For convex and smooth problems, we develop stability bounds involving the training errors to show the benefit of optimization in generalization. This improves the existing results by removing the Lipschitzness assumption and implying fast rates in a low-noise condition. We also develop the first optimal stability-based generalization bounds for D-SGD applied to nonsmooth problems. We further develop optimization error bounds which imply minimax optimal excess risk rates. Our novelty in the analysis consists of an error decomposition to use the co-coercivity of functions as well as the control of a neighboring-consensus error.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zeng25d, title = {Stability and Generalization Analysis of Decentralized {SGD}: Sharper Bounds Beyond Lipschitzness and Smoothness}, author = {Zeng, Shuang and Lei, Yunwen}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {74098--74132}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zeng25d/zeng25d.pdf}, url = {https://proceedings.mlr.press/v267/zeng25d.html}, abstract = {Decentralized SGD (D-SGD) is a popular optimization method to train large-scale machine learning models. In this paper, we study the generalization behavior of D-SGD for both smooth and nonsmooth problems by leveraging the algorithm stability. For convex and smooth problems, we develop stability bounds involving the training errors to show the benefit of optimization in generalization. This improves the existing results by removing the Lipschitzness assumption and implying fast rates in a low-noise condition. We also develop the first optimal stability-based generalization bounds for D-SGD applied to nonsmooth problems. We further develop optimization error bounds which imply minimax optimal excess risk rates. Our novelty in the analysis consists of an error decomposition to use the co-coercivity of functions as well as the control of a neighboring-consensus error.} }
Endnote
%0 Conference Paper %T Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness %A Shuang Zeng %A Yunwen Lei %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zeng25d %I PMLR %P 74098--74132 %U https://proceedings.mlr.press/v267/zeng25d.html %V 267 %X Decentralized SGD (D-SGD) is a popular optimization method to train large-scale machine learning models. In this paper, we study the generalization behavior of D-SGD for both smooth and nonsmooth problems by leveraging the algorithm stability. For convex and smooth problems, we develop stability bounds involving the training errors to show the benefit of optimization in generalization. This improves the existing results by removing the Lipschitzness assumption and implying fast rates in a low-noise condition. We also develop the first optimal stability-based generalization bounds for D-SGD applied to nonsmooth problems. We further develop optimization error bounds which imply minimax optimal excess risk rates. Our novelty in the analysis consists of an error decomposition to use the co-coercivity of functions as well as the control of a neighboring-consensus error.
APA
Zeng, S. & Lei, Y.. (2025). Stability and Generalization Analysis of Decentralized SGD: Sharper Bounds Beyond Lipschitzness and Smoothness. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74098-74132 Available from https://proceedings.mlr.press/v267/zeng25d.html.

Related Material