Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication

Hugo Richard, Etienne Boursier, Vianney Perchet
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:388-396, 2024.

Abstract

Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of $\mathcal{O}(\sqrt{T\log(T)})$ in the centralized setting. More significantly, we introduce Cautious Greedy, which uses $\mathcal{O}(\log(T))$ communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, $\log(T)$ times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with $\mathcal{O}(\log(T))$ communications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-richard24a, title = {Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication}, author = {Richard, Hugo and Boursier, Etienne and Perchet, Vianney}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {388--396}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/richard24a/richard24a.pdf}, url = {https://proceedings.mlr.press/v238/richard24a.html}, abstract = {Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of $\mathcal{O}(\sqrt{T\log(T)})$ in the centralized setting. More significantly, we introduce Cautious Greedy, which uses $\mathcal{O}(\log(T))$ communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, $\log(T)$ times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with $\mathcal{O}(\log(T))$ communications.} }
Endnote
%0 Conference Paper %T Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication %A Hugo Richard %A Etienne Boursier %A Vianney Perchet %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-richard24a %I PMLR %P 388--396 %U https://proceedings.mlr.press/v238/richard24a.html %V 238 %X Multiplayer bandits have recently garnered significant attention due to their relevance in cognitive radio networks. While the existing body of literature predominantly focuses on synchronous players, real-world radio networks, such as those in IoT applications, often feature asynchronous (i.e., randomly activated) devices. This highlights the need for addressing the more challenging asynchronous multiplayer bandits problem. Our first result shows that a natural extension of UCB achieves a minimax regret of $\mathcal{O}(\sqrt{T\log(T)})$ in the centralized setting. More significantly, we introduce Cautious Greedy, which uses $\mathcal{O}(\log(T))$ communications and whose instance-dependent regret is constant if the optimal policy assigns at least one player to each arm (a situation proven to occur when arm means are sufficiently close). Otherwise, the regret is, as usual, $\log(T)$ times the sum of some inverse sub-optimality gaps. We substantiate the optimality of Cautious Greedy through lower-bound analysis based on data-dependent terms. Therefore, we establish a strong baseline for asynchronous multiplayer bandits, at least with $\mathcal{O}(\log(T))$ communications.
APA
Richard, H., Boursier, E. & Perchet, V.. (2024). Constant or Logarithmic Regret in Asynchronous Multiplayer Bandits with Limited Communication. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:388-396 Available from https://proceedings.mlr.press/v238/richard24a.html.

Related Material