Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Tianyi Zhang; Yu Cao; Dianbo Liu

Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Tianyi Zhang, Yu Cao, Dianbo Liu

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:74565-74582, 2025.

Abstract

Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed Uncertainty-Based Extensible-Codebook Federated Learning (UEFL). This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3%–22.1%) and uncertainty reduction (by 38.83%–96.24%). The source code is available at https://github.com/destiny301/uefl.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zhang25f,
  title = 	 {Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos},
  author =       {Zhang, Tianyi and Cao, Yu and Liu, Dianbo},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {74565--74582},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25f/zhang25f.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zhang25f.html},
  abstract = 	 {Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed Uncertainty-Based Extensible-Codebook Federated Learning (UEFL). This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3%–22.1%) and uncertainty reduction (by 38.83%–96.24%). The source code is available at https://github.com/destiny301/uefl.}
}

Endnote

%0 Conference Paper
%T Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos
%A Tianyi Zhang
%A Yu Cao
%A Dianbo Liu
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zhang25f
%I PMLR
%P 74565--74582
%U https://proceedings.mlr.press/v267/zhang25f.html
%V 267
%X Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed Uncertainty-Based Extensible-Codebook Federated Learning (UEFL). This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3%–22.1%) and uncertainty reduction (by 38.83%–96.24%). The source code is available at https://github.com/destiny301/uefl.

APA

Zhang, T., Cao, Y. & Liu, D.. (2025). Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:74565-74582 Available from https://proceedings.mlr.press/v267/zhang25f.html.

Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Abstract

Cite this Paper

Related Material