Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

Haoran Zhang; Xuchuang Wang; Hao-Xu Chen; Hao Qiu; Lin Yang; Yang Gao

Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

Haoran Zhang, Xuchuang Wang, Hao-Xu Chen, Hao Qiu, Lin Yang, Yang Gao

Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:4959-4981, 2025.

Abstract

In this paper, we focus on the research of federated multi-armed bandit (FMAB) problems where agents can only communicate with their neighbors. All agents aim to solve a common multi-armed bandit (MAB) problem to minimize individual regrets, while group regret can also be minimized. In a federated bandit problem, an agent fails to estimate the global reward means of arms by only using local observations, and hence, the bandit learning algorithm usually adopts a consensus estimation strategy to address the heterogeneity. However, up to now, the existing algorithms with fully distributed communication graphs only achieved a suboptimal result for the problem. To address that, a fully distributed online consensus estimation algorithm (\texttt{CES}) is proposed to estimate the global mean without bias. Integrating this consensus estimator into a distributed successive elimination bandit algorithm framework yields our federated bandit algorithm. Our algorithm significantly improves both individual and group regrets over previous approaches, and we provide an in-depth analysis of the lower bound for this problem.

Cite this Paper

BibTeX

@InProceedings{pmlr-v286-zhang25f,
  title = 	 {Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication},
  author =       {Zhang, Haoran and Wang, Xuchuang and Chen, Hao-Xu and Qiu, Hao and Yang, Lin and Gao, Yang},
  booktitle = 	 {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {4959--4981},
  year = 	 {2025},
  editor = 	 {Chiappa, Silvia and Magliacane, Sara},
  volume = 	 {286},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v286/main/assets/zhang25f/zhang25f.pdf},
  url = 	 {https://proceedings.mlr.press/v286/zhang25f.html},
  abstract = 	 {In this paper, we focus on the research of federated multi-armed bandit (FMAB) problems where agents can only communicate with their neighbors. All agents aim to solve a common multi-armed bandit (MAB) problem to minimize individual regrets, while group regret can also be minimized. In a federated bandit problem, an agent fails to estimate the global reward means of arms by only using local observations, and hence, the bandit learning algorithm usually adopts a consensus estimation strategy to address the heterogeneity. However, up to now, the existing algorithms with fully distributed communication graphs only achieved a suboptimal result for the problem. To address that, a fully distributed online consensus estimation algorithm (\texttt{CES}) is proposed to estimate the global mean without bias. Integrating this consensus estimator into a distributed successive elimination bandit algorithm framework yields our federated bandit algorithm. Our algorithm significantly improves both individual and group regrets over previous approaches, and we provide an in-depth analysis of the lower bound for this problem.}
}

Endnote

%0 Conference Paper
%T Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication
%A Haoran Zhang
%A Xuchuang Wang
%A Hao-Xu Chen
%A Hao Qiu
%A Lin Yang
%A Yang Gao
%B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2025
%E Silvia Chiappa
%E Sara Magliacane	
%F pmlr-v286-zhang25f
%I PMLR
%P 4959--4981
%U https://proceedings.mlr.press/v286/zhang25f.html
%V 286
%X In this paper, we focus on the research of federated multi-armed bandit (FMAB) problems where agents can only communicate with their neighbors. All agents aim to solve a common multi-armed bandit (MAB) problem to minimize individual regrets, while group regret can also be minimized. In a federated bandit problem, an agent fails to estimate the global reward means of arms by only using local observations, and hence, the bandit learning algorithm usually adopts a consensus estimation strategy to address the heterogeneity. However, up to now, the existing algorithms with fully distributed communication graphs only achieved a suboptimal result for the problem. To address that, a fully distributed online consensus estimation algorithm (\texttt{CES}) is proposed to estimate the global mean without bias. Integrating this consensus estimator into a distributed successive elimination bandit algorithm framework yields our federated bandit algorithm. Our algorithm significantly improves both individual and group regrets over previous approaches, and we provide an in-depth analysis of the lower bound for this problem.

APA

Zhang, H., Wang, X., Chen, H., Qiu, H., Yang, L. & Gao, Y.. (2025). Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:4959-4981 Available from https://proceedings.mlr.press/v286/zhang25f.html.

Near-Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication

Abstract

Cite this Paper

Related Material