Stochastic Dueling Bandits with Adversarial Corruption

Arpit Agarwal; Shivani Agarwal; Prathamesh Patil

Stochastic Dueling Bandits with Adversarial Corruption

Arpit Agarwal, Shivani Agarwal, Prathamesh Patil

Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:217-248, 2021.

Abstract

The dueling bandits problem has received a lot of attention in recent years due to its applications in recommendation systems and information retrieval. However, due to the prevalence of malicious users in these systems, it is becoming increasingly important to design dueling bandit algorithms that are robust to corruptions introduced by these malicious users. In this paper we study dueling bandits in the presence of an adversary that can corrupt some of the feedback received by the learner. We propose an algorithm for this problem that is agnostic to the amount of corruption introduced by the adversary: its regret degrades gracefully with the amount of corruption, and in case of no corruption, it essentially matches the optimal regret bounds achievable in the purely stochastic dueling bandits setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v132-agarwal21a,
  title = 	 {Stochastic Dueling Bandits with Adversarial Corruption},
  author =       {Agarwal, Arpit and Agarwal, Shivani and Patil, Prathamesh},
  booktitle = 	 {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  pages = 	 {217--248},
  year = 	 {2021},
  editor = 	 {Feldman, Vitaly and Ligett, Katrina and Sabato, Sivan},
  volume = 	 {132},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v132/agarwal21a/agarwal21a.pdf},
  url = 	 {https://proceedings.mlr.press/v132/agarwal21a.html},
  abstract = 	 {The dueling bandits problem has received a lot of attention in recent years due to its applications in recommendation systems and information retrieval. However, due to the prevalence of malicious users in these systems, it is becoming increasingly important to design dueling bandit algorithms that are robust to corruptions introduced by these malicious users. In this paper we study dueling bandits in the presence of an adversary that can corrupt some of the feedback received by the learner. We propose an algorithm for this problem that is agnostic to the amount of corruption introduced by the adversary: its regret degrades gracefully with the amount of corruption, and in case of no corruption, it essentially matches the optimal regret bounds achievable in the purely stochastic dueling bandits setting.}
}

Endnote

%0 Conference Paper
%T Stochastic Dueling Bandits with Adversarial Corruption
%A Arpit Agarwal
%A Shivani Agarwal
%A Prathamesh Patil
%B Proceedings of the 32nd International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Vitaly Feldman
%E Katrina Ligett
%E Sivan Sabato	
%F pmlr-v132-agarwal21a
%I PMLR
%P 217--248
%U https://proceedings.mlr.press/v132/agarwal21a.html
%V 132
%X The dueling bandits problem has received a lot of attention in recent years due to its applications in recommendation systems and information retrieval. However, due to the prevalence of malicious users in these systems, it is becoming increasingly important to design dueling bandit algorithms that are robust to corruptions introduced by these malicious users. In this paper we study dueling bandits in the presence of an adversary that can corrupt some of the feedback received by the learner. We propose an algorithm for this problem that is agnostic to the amount of corruption introduced by the adversary: its regret degrades gracefully with the amount of corruption, and in case of no corruption, it essentially matches the optimal regret bounds achievable in the purely stochastic dueling bandits setting.

APA


Agarwal, A., Agarwal, S. & Patil, P.. (2021). Stochastic Dueling Bandits with Adversarial Corruption. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:217-248 Available from https://proceedings.mlr.press/v132/agarwal21a.html.

Related Material

Download PDF