Identifying Copeland Winners in Dueling Bandits with Indifferences

Viktor Bengs; Björn Haddenhorst; Eyke Hüllermeier

Identifying Copeland Winners in Dueling Bandits with Indifferences

Viktor Bengs, Björn Haddenhorst, Eyke Hüllermeier

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:226-234, 2024.

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-bengs24a,
  title = 	 {Identifying {C}opeland Winners in Dueling Bandits with Indifferences},
  author =       {Bengs, Viktor and Haddenhorst, Bj\"{o}rn and H\"{u}llermeier, Eyke},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {226--234},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/bengs24a/bengs24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/bengs24a.html},
  abstract = 	 {We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.}
}

Endnote

%0 Conference Paper
%T Identifying Copeland Winners in Dueling Bandits with Indifferences
%A Viktor Bengs
%A Björn Haddenhorst
%A Eyke Hüllermeier
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-bengs24a
%I PMLR
%P 226--234
%U https://proceedings.mlr.press/v238/bengs24a.html
%V 238
%X We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

APA

Bengs, V., Haddenhorst, B. & Hüllermeier, E.. (2024). Identifying Copeland Winners in Dueling Bandits with Indifferences. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:226-234 Available from https://proceedings.mlr.press/v238/bengs24a.html.

Identifying Copeland Winners in Dueling Bandits with Indifferences

Abstract

Cite this Paper

Related Material