Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents

Safwan Labbi, Daniil Tiapkin, Lorenzo Mancini, Paul Mangold, Eric Moulines
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1315-1323, 2025.

Abstract

In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde O(\sqrt{H^3 |S| |A| T / M})$, with a small additional term due to heterogeneity, where $|S|$ is the number of states, $|A|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$’s communication complexity only marginally increases with the number of agents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-labbi25a, title = {Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents}, author = {Labbi, Safwan and Tiapkin, Daniil and Mancini, Lorenzo and Mangold, Paul and Moulines, Eric}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1315--1323}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/labbi25a/labbi25a.pdf}, url = {https://proceedings.mlr.press/v258/labbi25a.html}, abstract = {In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde O(\sqrt{H^3 |S| |A| T / M})$, with a small additional term due to heterogeneity, where $|S|$ is the number of states, $|A|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$’s communication complexity only marginally increases with the number of agents.} }
Endnote
%0 Conference Paper %T Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents %A Safwan Labbi %A Daniil Tiapkin %A Lorenzo Mancini %A Paul Mangold %A Eric Moulines %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-labbi25a %I PMLR %P 1315--1323 %U https://proceedings.mlr.press/v258/labbi25a.html %V 258 %X In this paper, we present the Federated Upper Confidence Bound Value Iteration algorithm ($\texttt{Fed-UCBVI}$), a novel extension of the $\texttt{UCBVI}$ algorithm (Azar et al., 2017) tailored for the federated learning framework. We prove that the regret of $\texttt{Fed-UCBVI}$ scales as $\tilde O(\sqrt{H^3 |S| |A| T / M})$, with a small additional term due to heterogeneity, where $|S|$ is the number of states, $|A|$ is the number of actions, $H$ is the episode length, $M$ is the number of agents, and $T$ is the number of episodes. Notably, in the single-agent setting, this upper bound matches the minimax lower bound up to polylogarithmic factors, while in the multi-agent scenario, $\texttt{Fed-UCBVI}$ has linear speed-up. To conduct our analysis, we introduce a new measure of heterogeneity, which may hold independent theoretical interest. Furthermore, we show that, unlike existing federated reinforcement learning approaches, $\texttt{Fed-UCBVI}$’s communication complexity only marginally increases with the number of agents.
APA
Labbi, S., Tiapkin, D., Mancini, L., Mangold, P. & Moulines, E.. (2025). Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1315-1323 Available from https://proceedings.mlr.press/v258/labbi25a.html.

Related Material