Coordination without communication: optimal regret in two players multi-armed bandits

Sébastien Bubeck, Thomas Budzinski
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:916-939, 2020.

Abstract

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-bubeck20a, title = {Coordination without communication: optimal regret in two players multi-armed bandits}, author = {Bubeck, S\'ebastien and Budzinski, Thomas}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {916--939}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/bubeck20a/bubeck20a.pdf}, url = {https://proceedings.mlr.press/v125/bubeck20a.html}, abstract = { We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.} }
Endnote
%0 Conference Paper %T Coordination without communication: optimal regret in two players multi-armed bandits %A Sébastien Bubeck %A Thomas Budzinski %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-bubeck20a %I PMLR %P 916--939 %U https://proceedings.mlr.press/v125/bubeck20a.html %V 125 %X We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.
APA
Bubeck, S. & Budzinski, T.. (2020). Coordination without communication: optimal regret in two players multi-armed bandits. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:916-939 Available from https://proceedings.mlr.press/v125/bubeck20a.html.

Related Material