Reducing Dueling Bandits to Cardinal Bandits

Nir Ailon; Zohar Karnin; Thorsten Joachims

Reducing Dueling Bandits to Cardinal Bandits

Nir Ailon, Zohar Karnin, Thorsten Joachims

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):856-864, 2014.

Abstract

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions – named \Doubler, \MultiSbm and \DoubleSbm – provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-ailon14,
  title = 	 {Reducing Dueling Bandits to Cardinal Bandits},
  author = 	 {Ailon, Nir and Karnin, Zohar and Joachims, Thorsten},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {856--864},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/ailon14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/ailon14.html},
  abstract = 	 {We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions – named \Doubler, \MultiSbm and \DoubleSbm – provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting.     For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments.  In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.}
}

Endnote

%0 Conference Paper
%T Reducing Dueling Bandits to Cardinal Bandits
%A Nir Ailon
%A Zohar Karnin
%A Thorsten Joachims
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-ailon14
%I PMLR
%P 856--864
%U https://proceedings.mlr.press/v32/ailon14.html
%V 32
%N 2
%X We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions – named \Doubler, \MultiSbm and \DoubleSbm – provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting.     For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments.  In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.

RIS


TY  - CPAPER
TI  - Reducing Dueling Bandits to Cardinal Bandits
AU  - Nir Ailon
AU  - Zohar Karnin
AU  - Thorsten Joachims
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-ailon14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 856
EP  - 864
L1  - http://proceedings.mlr.press/v32/ailon14.pdf
UR  - https://proceedings.mlr.press/v32/ailon14.html
AB  - We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form “A is preferred to B” (as opposed to cardinal feedback like “A has value 2.5”), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions – named \Doubler, \MultiSbm and \DoubleSbm – provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting.     For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments.  In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.
ER  -

APA


Ailon, N., Karnin, Z. & Joachims, T.. (2014). Reducing Dueling Bandits to Cardinal Bandits. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):856-864 Available from https://proceedings.mlr.press/v32/ailon14.html.

Reducing Dueling Bandits to Cardinal Bandits

Abstract

Cite this Paper

Related Material