Multi-agent Multi-armed Bandit Regret Complexity and Optimality

Mengfan Xu, Diego Klabjan
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1639-1647, 2025.

Abstract

Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. These lower bounds are made possible through our newly constructed instances. In the numerical study, we assess the performance of various algorithms on these hard instances. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-xu25c, title = {Multi-agent Multi-armed Bandit Regret Complexity and Optimality}, author = {Xu, Mengfan and Klabjan, Diego}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1639--1647}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/xu25c/xu25c.pdf}, url = {https://proceedings.mlr.press/v258/xu25c.html}, abstract = {Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. These lower bounds are made possible through our newly constructed instances. In the numerical study, we assess the performance of various algorithms on these hard instances. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.} }
Endnote
%0 Conference Paper %T Multi-agent Multi-armed Bandit Regret Complexity and Optimality %A Mengfan Xu %A Diego Klabjan %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-xu25c %I PMLR %P 1639--1647 %U https://proceedings.mlr.press/v258/xu25c.html %V 258 %X Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. These lower bounds are made possible through our newly constructed instances. In the numerical study, we assess the performance of various algorithms on these hard instances. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.
APA
Xu, M. & Klabjan, D.. (2025). Multi-agent Multi-armed Bandit Regret Complexity and Optimality. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1639-1647 Available from https://proceedings.mlr.press/v258/xu25c.html.

Related Material