Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Milad Sefidgaran; Romain Chor; Abdellatif Zaidi; Yijun Wan

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Milad Sefidgaran, Romain Chor, Abdellatif Zaidi, Yijun Wan

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:44093-44135, 2024.

Abstract

We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds

$R$ between

$K$ clients and a parameter server (PS), i.e. the effect on the generalization error of how often the clients’ local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of

$R$ . We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds

$R$ , in addition to the number of participating devices

$K$ and individual datasets size

$n$ . The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with

$R$ , suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with

$R$ than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of

$\mathcal{O}(\sqrt{\log(K)/K})$ . Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of

$R$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-sefidgaran24a,
  title = 	 {Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!},
  author =       {Sefidgaran, Milad and Chor, Romain and Zaidi, Abdellatif and Wan, Yijun},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {44093--44135},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/sefidgaran24a/sefidgaran24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/sefidgaran24a.html},
  abstract = 	 {We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server (PS), i.e. the effect on the generalization error of how often the clients’ local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of $R$. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with $R$ than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$. Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of $R$.}
}

Endnote

%0 Conference Paper
%T Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!
%A Milad Sefidgaran
%A Romain Chor
%A Abdellatif Zaidi
%A Yijun Wan
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-sefidgaran24a
%I PMLR
%P 44093--44135
%U https://proceedings.mlr.press/v235/sefidgaran24a.html
%V 235
%X We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds $R$ between $K$ clients and a parameter server (PS), i.e. the effect on the generalization error of how often the clients’ local models are aggregated at PS. In our setup, the more the clients communicate with PS the less data they use for local training in each round, such that the amount of training data per client is identical for distinct values of $R$. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds $R$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply to a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and derive (more) explicit bounds in this case. In particular, we show that the generalization bound of FSVM increases with $R$, suggesting that more frequent communication with PS diminishes the generalization power. This implies that the population risk decreases less fast with $R$ than does the empirical risk. Moreover, our bound suggests that the generalization error of FSVM decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$. Finally, we provide experimental results obtained using neural networks (ResNet-56) which show evidence that not only may our observations for FSVM hold more generally but also that the population risk may even start to increase beyond some value of $R$.

APA


Sefidgaran, M., Chor, R., Zaidi, A. & Wan, Y.. (2024). Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:44093-44135 Available from https://proceedings.mlr.press/v235/sefidgaran24a.html.

Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

Abstract

Cite this Paper

Related Material