On-Demand Communication for Asynchronous Multi-Agent Bandits

Yu-Zhen Janice Chen, Lin Yang, Xuchuang Wang, Xutong Liu, Mohammad Hajiesmaili, John C. S. Lui, Don Towsley
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3903-3930, 2023.

Abstract

This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously – agent pull times and rates are unknown, irregular, and heterogeneous – and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-chen23c, title = {On-Demand Communication for Asynchronous Multi-Agent Bandits}, author = {Chen, Yu-Zhen Janice and Yang, Lin and Wang, Xuchuang and Liu, Xutong and Hajiesmaili, Mohammad and Lui, John C. S. and Towsley, Don}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3903--3930}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/chen23c/chen23c.pdf}, url = {https://proceedings.mlr.press/v206/chen23c.html}, abstract = {This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously – agent pull times and rates are unknown, irregular, and heterogeneous – and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.} }
Endnote
%0 Conference Paper %T On-Demand Communication for Asynchronous Multi-Agent Bandits %A Yu-Zhen Janice Chen %A Lin Yang %A Xuchuang Wang %A Xutong Liu %A Mohammad Hajiesmaili %A John C. S. Lui %A Don Towsley %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-chen23c %I PMLR %P 3903--3930 %U https://proceedings.mlr.press/v206/chen23c.html %V 206 %X This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously – agent pull times and rates are unknown, irregular, and heterogeneous – and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.
APA
Chen, Y.J., Yang, L., Wang, X., Liu, X., Hajiesmaili, M., Lui, J.C.S. & Towsley, D.. (2023). On-Demand Communication for Asynchronous Multi-Agent Bandits. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3903-3930 Available from https://proceedings.mlr.press/v206/chen23c.html.

Related Material