Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up

Paul Mangold, Alain Oliviero Durmus, Aymeric Dieuleveut, Eric Moulines
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:42902-42946, 2025.

Abstract

This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings—where local control variates mitigate client drift—is well established, the impact of stochastic gradient updates on its performance is less understood. To address this problem, we first show that its global parameters and control variates define a Markov chain that converges to a stationary distribution in the Wasserstein distance. Leveraging this result, we prove that Scaffold achieves linear speed-up in the number of clients up to higher-order terms in the step size. Nevertheless, our analysis reveals that Scaffold retains a higher-order bias, similar to FedAvg, that does not decrease as the number of clients increases. This highlights opportunities for developing improved stochastic federated learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-mangold25a, title = {Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up}, author = {Mangold, Paul and Oliviero Durmus, Alain and Dieuleveut, Aymeric and Moulines, Eric}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {42902--42946}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mangold25a/mangold25a.pdf}, url = {https://proceedings.mlr.press/v267/mangold25a.html}, abstract = {This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings—where local control variates mitigate client drift—is well established, the impact of stochastic gradient updates on its performance is less understood. To address this problem, we first show that its global parameters and control variates define a Markov chain that converges to a stationary distribution in the Wasserstein distance. Leveraging this result, we prove that Scaffold achieves linear speed-up in the number of clients up to higher-order terms in the step size. Nevertheless, our analysis reveals that Scaffold retains a higher-order bias, similar to FedAvg, that does not decrease as the number of clients increases. This highlights opportunities for developing improved stochastic federated learning algorithms.} }
Endnote
%0 Conference Paper %T Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up %A Paul Mangold %A Alain Oliviero Durmus %A Aymeric Dieuleveut %A Eric Moulines %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-mangold25a %I PMLR %P 42902--42946 %U https://proceedings.mlr.press/v267/mangold25a.html %V 267 %X This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning. While its convergence in deterministic settings—where local control variates mitigate client drift—is well established, the impact of stochastic gradient updates on its performance is less understood. To address this problem, we first show that its global parameters and control variates define a Markov chain that converges to a stationary distribution in the Wasserstein distance. Leveraging this result, we prove that Scaffold achieves linear speed-up in the number of clients up to higher-order terms in the step size. Nevertheless, our analysis reveals that Scaffold retains a higher-order bias, similar to FedAvg, that does not decrease as the number of clients increases. This highlights opportunities for developing improved stochastic federated learning algorithms.
APA
Mangold, P., Oliviero Durmus, A., Dieuleveut, A. & Moulines, E.. (2025). Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:42902-42946 Available from https://proceedings.mlr.press/v267/mangold25a.html.

Related Material