Efficient Heterogeneity-Aware Federated Active Data Selection

Ying-Peng Tang, Chao Ren, Xiaoli Tang, Sheng-Jun Huang, Lizhen Cui, Han Yu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:58931-58943, 2025.

Abstract

Federated Active Learning (FAL) aims to learn an effective global model, while minimizing label queries. Owing to privacy requirements, it is challenging to design effective active data selection schemes due to the lack of cross-client query information. In this paper, we bridge this important gap by proposing the Federated Active data selection by LEverage score sampling (FALE) method. It is designed for regression tasks in the presence of non-i.i.d. client data to enable the server to select data globally in a privacy-preserving manner. Based on FedSVD, FALE aims to estimate the utility of unlabeled data and perform data selection via leverage score sampling. Besides, a secure model learning framework is designed for federated regression tasks to exploit supervision. FALE can operate without requiring an initial labeled set and select the instances in a single pass, significantly reducing communication overhead. Theoretical analyze establishes the query complexity for FALE to achieve constant factor approximation and relative error approximation. Extensive experiments on 11 benchmark datasets demonstrate significant improvements of FALE over existing state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-tang25i, title = {Efficient Heterogeneity-Aware Federated Active Data Selection}, author = {Tang, Ying-Peng and Ren, Chao and Tang, Xiaoli and Huang, Sheng-Jun and Cui, Lizhen and Yu, Han}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {58931--58943}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/tang25i/tang25i.pdf}, url = {https://proceedings.mlr.press/v267/tang25i.html}, abstract = {Federated Active Learning (FAL) aims to learn an effective global model, while minimizing label queries. Owing to privacy requirements, it is challenging to design effective active data selection schemes due to the lack of cross-client query information. In this paper, we bridge this important gap by proposing the Federated Active data selection by LEverage score sampling (FALE) method. It is designed for regression tasks in the presence of non-i.i.d. client data to enable the server to select data globally in a privacy-preserving manner. Based on FedSVD, FALE aims to estimate the utility of unlabeled data and perform data selection via leverage score sampling. Besides, a secure model learning framework is designed for federated regression tasks to exploit supervision. FALE can operate without requiring an initial labeled set and select the instances in a single pass, significantly reducing communication overhead. Theoretical analyze establishes the query complexity for FALE to achieve constant factor approximation and relative error approximation. Extensive experiments on 11 benchmark datasets demonstrate significant improvements of FALE over existing state-of-the-art methods.} }
Endnote
%0 Conference Paper %T Efficient Heterogeneity-Aware Federated Active Data Selection %A Ying-Peng Tang %A Chao Ren %A Xiaoli Tang %A Sheng-Jun Huang %A Lizhen Cui %A Han Yu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-tang25i %I PMLR %P 58931--58943 %U https://proceedings.mlr.press/v267/tang25i.html %V 267 %X Federated Active Learning (FAL) aims to learn an effective global model, while minimizing label queries. Owing to privacy requirements, it is challenging to design effective active data selection schemes due to the lack of cross-client query information. In this paper, we bridge this important gap by proposing the Federated Active data selection by LEverage score sampling (FALE) method. It is designed for regression tasks in the presence of non-i.i.d. client data to enable the server to select data globally in a privacy-preserving manner. Based on FedSVD, FALE aims to estimate the utility of unlabeled data and perform data selection via leverage score sampling. Besides, a secure model learning framework is designed for federated regression tasks to exploit supervision. FALE can operate without requiring an initial labeled set and select the instances in a single pass, significantly reducing communication overhead. Theoretical analyze establishes the query complexity for FALE to achieve constant factor approximation and relative error approximation. Extensive experiments on 11 benchmark datasets demonstrate significant improvements of FALE over existing state-of-the-art methods.
APA
Tang, Y., Ren, C., Tang, X., Huang, S., Cui, L. & Yu, H.. (2025). Efficient Heterogeneity-Aware Federated Active Data Selection. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:58931-58943 Available from https://proceedings.mlr.press/v267/tang25i.html.

Related Material