Optimal Regret Bounds for Collaborative Learning in Bandits

Amitis Shidani, Sattar Vakili
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:1013-1029, 2024.

Abstract

We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-shidani24a, title = {Optimal Regret Bounds for Collaborative Learning in Bandits}, author = {Shidani, Amitis and Vakili, Sattar}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {1013--1029}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/shidani24a/shidani24a.pdf}, url = {https://proceedings.mlr.press/v237/shidani24a.html}, abstract = {We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.} }
Endnote
%0 Conference Paper %T Optimal Regret Bounds for Collaborative Learning in Bandits %A Amitis Shidani %A Sattar Vakili %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-shidani24a %I PMLR %P 1013--1029 %U https://proceedings.mlr.press/v237/shidani24a.html %V 237 %X We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.
APA
Shidani, A. & Vakili, S.. (2024). Optimal Regret Bounds for Collaborative Learning in Bandits. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:1013-1029 Available from https://proceedings.mlr.press/v237/shidani24a.html.

Related Material