Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21111-21132, 2022.

Abstract

In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-tang22d, title = {Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning}, author = {Tang, Zhenheng and Zhang, Yonggang and Shi, Shaohuai and He, Xin and Han, Bo and Chu, Xiaowen}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {21111--21132}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/tang22d/tang22d.pdf}, url = {https://proceedings.mlr.press/v162/tang22d.html}, abstract = {In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.} }
Endnote
%0 Conference Paper %T Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning %A Zhenheng Tang %A Yonggang Zhang %A Shaohuai Shi %A Xin He %A Bo Han %A Xiaowen Chu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-tang22d %I PMLR %P 21111--21132 %U https://proceedings.mlr.press/v162/tang22d.html %V 162 %X In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.
APA
Tang, Z., Zhang, Y., Shi, S., He, X., Han, B. & Chu, X.. (2022). Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:21111-21132 Available from https://proceedings.mlr.press/v162/tang22d.html.

Related Material