Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4189-4217, 2023.

Abstract

The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chawla23a, title = {Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits}, author = {Chawla, Ronshee and Vial, Daniel and Shakkottai, Sanjay and Srikant, R.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {4189--4217}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chawla23a/chawla23a.pdf}, url = {https://proceedings.mlr.press/v202/chawla23a.html}, abstract = {The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.} }
Endnote
%0 Conference Paper %T Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits %A Ronshee Chawla %A Daniel Vial %A Sanjay Shakkottai %A R. Srikant %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chawla23a %I PMLR %P 4189--4217 %U https://proceedings.mlr.press/v202/chawla23a.html %V 202 %X The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.
APA
Chawla, R., Vial, D., Shakkottai, S. & Srikant, R.. (2023). Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:4189-4217 Available from https://proceedings.mlr.press/v202/chawla23a.html.

Related Material