Stochastic Dueling Bandits with Adversarial Corruption
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:217-248, 2021.
The dueling bandits problem has received a lot of attention in recent years due to its applications in recommendation systems and information retrieval. However, due to the prevalence of malicious users in these systems, it is becoming increasingly important to design dueling bandit algorithms that are robust to corruptions introduced by these malicious users. In this paper we study dueling bandits in the presence of an adversary that can corrupt some of the feedback received by the learner. We propose an algorithm for this problem that is agnostic to the amount of corruption introduced by the adversary: its regret degrades gracefully with the amount of corruption, and in case of no corruption, it essentially matches the optimal regret bounds achievable in the purely stochastic dueling bandits setting.