FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks

Bingqing Song, Prashant Khanduri, Xinwei Zhang, Jinfeng Yi, Mingyi Hong
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:32304-32330, 2023.

Abstract

Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size $N$ (where $N$ is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients’ data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-song23e, title = {{F}ed{A}vg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks}, author = {Song, Bingqing and Khanduri, Prashant and Zhang, Xinwei and Yi, Jinfeng and Hong, Mingyi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {32304--32330}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/song23e/song23e.pdf}, url = {https://proceedings.mlr.press/v202/song23e.html}, abstract = {Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size $N$ (where $N$ is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients’ data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.} }
Endnote
%0 Conference Paper %T FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks %A Bingqing Song %A Prashant Khanduri %A Xinwei Zhang %A Jinfeng Yi %A Mingyi Hong %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-song23e %I PMLR %P 32304--32330 %U https://proceedings.mlr.press/v202/song23e.html %V 202 %X Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size $N$ (where $N$ is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients’ data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.
APA
Song, B., Khanduri, P., Zhang, X., Yi, J. & Hong, M.. (2023). FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:32304-32330 Available from https://proceedings.mlr.press/v202/song23e.html.

Related Material