Multi-agent Stochastic Bandits Robust to Adversarial Corruptions

Fatemeh Ghaffari, Xuchuang Wang, Jinhang Zuo, Mohammad Hajiesmaili
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:12-25, 2025.

Abstract

We study cooperative multi-agent multi-armed bandits with adversarial corruption in a heterogeneous setting, where each agent has access to a subset of the full arm set, and the adversary can corrupt the reward observations for all agents. The objective is to maximize the cumulative total reward of all agents (and not be misled by the adversary). We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruption. For this newly devised algorithm, we demonstrate that an adversary with an unknown corruption budget $C$ only incurs an additive $O((L / L_{\min}) C)$ term to the standard regret of the model in non-corruption settings, where $L$ is the total number of agents, and $L_{\min}$ is the minimum number of agents with mutual access to an arm. As independent side contributions, our algorithm improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios, tightening multiplicative $K$ (the number of arms) and $L$ (the number of agents) factors, respectively. Lastly, we conduct numerical simulations to corroborate the superiority of our proposed algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-ghaffari25a, title = {Multi-agent Stochastic Bandits Robust to Adversarial Corruptions}, author = {Ghaffari, Fatemeh and Wang, Xuchuang and Zuo, Jinhang and Hajiesmaili, Mohammad}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {12--25}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/ghaffari25a/ghaffari25a.pdf}, url = {https://proceedings.mlr.press/v283/ghaffari25a.html}, abstract = {We study cooperative multi-agent multi-armed bandits with adversarial corruption in a heterogeneous setting, where each agent has access to a subset of the full arm set, and the adversary can corrupt the reward observations for all agents. The objective is to maximize the cumulative total reward of all agents (and not be misled by the adversary). We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruption. For this newly devised algorithm, we demonstrate that an adversary with an unknown corruption budget $C$ only incurs an additive $O((L / L_{\min}) C)$ term to the standard regret of the model in non-corruption settings, where $L$ is the total number of agents, and $L_{\min}$ is the minimum number of agents with mutual access to an arm. As independent side contributions, our algorithm improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios, tightening multiplicative $K$ (the number of arms) and $L$ (the number of agents) factors, respectively. Lastly, we conduct numerical simulations to corroborate the superiority of our proposed algorithm.} }
Endnote
%0 Conference Paper %T Multi-agent Stochastic Bandits Robust to Adversarial Corruptions %A Fatemeh Ghaffari %A Xuchuang Wang %A Jinhang Zuo %A Mohammad Hajiesmaili %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-ghaffari25a %I PMLR %P 12--25 %U https://proceedings.mlr.press/v283/ghaffari25a.html %V 283 %X We study cooperative multi-agent multi-armed bandits with adversarial corruption in a heterogeneous setting, where each agent has access to a subset of the full arm set, and the adversary can corrupt the reward observations for all agents. The objective is to maximize the cumulative total reward of all agents (and not be misled by the adversary). We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruption. For this newly devised algorithm, we demonstrate that an adversary with an unknown corruption budget $C$ only incurs an additive $O((L / L_{\min}) C)$ term to the standard regret of the model in non-corruption settings, where $L$ is the total number of agents, and $L_{\min}$ is the minimum number of agents with mutual access to an arm. As independent side contributions, our algorithm improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios, tightening multiplicative $K$ (the number of arms) and $L$ (the number of agents) factors, respectively. Lastly, we conduct numerical simulations to corroborate the superiority of our proposed algorithm.
APA
Ghaffari, F., Wang, X., Zuo, J. & Hajiesmaili, M.. (2025). Multi-agent Stochastic Bandits Robust to Adversarial Corruptions. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:12-25 Available from https://proceedings.mlr.press/v283/ghaffari25a.html.

Related Material