A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane; Tanguy Urvoy; Fabrice Clérot

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:218-227, 2015.

Abstract

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-gajane15,
  title = 	 {A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits},
  author = 	 {Gajane, Pratik and Urvoy, Tanguy and Clérot, Fabrice},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {218--227},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/gajane15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/gajane15.html},
  abstract = 	 {We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.}
}

Endnote

%0 Conference Paper
%T A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
%A Pratik Gajane
%A Tanguy Urvoy
%A Fabrice Clérot
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-gajane15
%I PMLR
%P 218--227
%U https://proceedings.mlr.press/v37/gajane15.html
%V 37
%X We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

RIS


TY  - CPAPER
TI  - A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
AU  - Pratik Gajane
AU  - Tanguy Urvoy
AU  - Fabrice Clérot
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-gajane15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 218
EP  - 227
L1  - http://proceedings.mlr.press/v37/gajane15.pdf
UR  - https://proceedings.mlr.press/v37/gajane15.html
AB  - We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
ER  -

APA


Gajane, P., Urvoy, T. & Clérot, F.. (2015). A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:218-227 Available from https://proceedings.mlr.press/v37/gajane15.html.

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Abstract

Cite this Paper

Related Material