Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg

Like Jian, Dong Liu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:27399-27425, 2025.

Abstract

Federated learning (FL) enables decentralized clients to train a model collaboratively without sharing local data. A key distinction between FL and centralized learning is that clients’ data are non-independent and identically distributed, which poses significant challenges in training a global model that generalizes well across heterogeneous local data distributions. In this paper, we analyze the convergence of overparameterized FedAvg with gradient descent (GD). We prove that the impact of data heterogeneity diminishes as the width of neural networks increases, ultimately vanishing when the width approaches infinity. In the infinite-width regime, we further prove that both the global and local models in FedAvg behave as linear models, and that FedAvg achieves the same generalization performance as centralized learning with the same number of GD iterations. Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jian25a, title = {Widening the Network Mitigates the Impact of Data Heterogeneity on {F}ed{A}vg}, author = {Jian, Like and Liu, Dong}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {27399--27425}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jian25a/jian25a.pdf}, url = {https://proceedings.mlr.press/v267/jian25a.html}, abstract = {Federated learning (FL) enables decentralized clients to train a model collaboratively without sharing local data. A key distinction between FL and centralized learning is that clients’ data are non-independent and identically distributed, which poses significant challenges in training a global model that generalizes well across heterogeneous local data distributions. In this paper, we analyze the convergence of overparameterized FedAvg with gradient descent (GD). We prove that the impact of data heterogeneity diminishes as the width of neural networks increases, ultimately vanishing when the width approaches infinity. In the infinite-width regime, we further prove that both the global and local models in FedAvg behave as linear models, and that FedAvg achieves the same generalization performance as centralized learning with the same number of GD iterations. Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods.} }
Endnote
%0 Conference Paper %T Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg %A Like Jian %A Dong Liu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jian25a %I PMLR %P 27399--27425 %U https://proceedings.mlr.press/v267/jian25a.html %V 267 %X Federated learning (FL) enables decentralized clients to train a model collaboratively without sharing local data. A key distinction between FL and centralized learning is that clients’ data are non-independent and identically distributed, which poses significant challenges in training a global model that generalizes well across heterogeneous local data distributions. In this paper, we analyze the convergence of overparameterized FedAvg with gradient descent (GD). We prove that the impact of data heterogeneity diminishes as the width of neural networks increases, ultimately vanishing when the width approaches infinity. In the infinite-width regime, we further prove that both the global and local models in FedAvg behave as linear models, and that FedAvg achieves the same generalization performance as centralized learning with the same number of GD iterations. Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods.
APA
Jian, L. & Liu, D.. (2025). Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:27399-27425 Available from https://proceedings.mlr.press/v267/jian25a.html.

Related Material