Cooperative Multi-Agent Bandits with Heavy Tails

Abhimanyu Dubey, Alex ‘Sandy’ Pentland
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2730-2739, 2020.

Abstract

We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as running consensus, that does not lend itself to robust estimation for heavy-tailed settings. We propose MP-UCB, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for MP-UCB for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-dubey20a, title = {Cooperative Multi-Agent Bandits with Heavy Tails}, author = {Dubey, Abhimanyu and Pentland, Alex `Sandy'}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2730--2739}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/dubey20a/dubey20a.pdf}, url = {https://proceedings.mlr.press/v119/dubey20a.html}, abstract = {We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as running consensus, that does not lend itself to robust estimation for heavy-tailed settings. We propose MP-UCB, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for MP-UCB for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location.} }
Endnote
%0 Conference Paper %T Cooperative Multi-Agent Bandits with Heavy Tails %A Abhimanyu Dubey %A Alex ‘Sandy’ Pentland %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-dubey20a %I PMLR %P 2730--2739 %U https://proceedings.mlr.press/v119/dubey20a.html %V 119 %X We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as running consensus, that does not lend itself to robust estimation for heavy-tailed settings. We propose MP-UCB, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for MP-UCB for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location.
APA
Dubey, A. & Pentland, A.‘.. (2020). Cooperative Multi-Agent Bandits with Heavy Tails. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2730-2739 Available from https://proceedings.mlr.press/v119/dubey20a.html.

Related Material