[edit]
Multi-agent Stochastic Bandits Robust to Adversarial Corruptions
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:12-25, 2025.
Abstract
We study cooperative multi-agent multi-armed bandits with adversarial corruption in a heterogeneous setting, where each agent has access to a subset of the full arm set, and the adversary can corrupt the reward observations for all agents. The objective is to maximize the cumulative total reward of all agents (and not be misled by the adversary). We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruption. For this newly devised algorithm, we demonstrate that an adversary with an unknown corruption budget $C$ only incurs an additive $O((L / L_{\min}) C)$ term to the standard regret of the model in non-corruption settings, where $L$ is the total number of agents, and $L_{\min}$ is the minimum number of agents with mutual access to an arm. As independent side contributions, our algorithm improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios, tightening multiplicative $K$ (the number of arms) and $L$ (the number of agents) factors, respectively. Lastly, we conduct numerical simulations to corroborate the superiority of our proposed algorithm.