Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Junpei Komiyama; Junya Honda; Hisashi Kashima; Hiroshi Nakagawa

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa

Proceedings of The 28th Conference on Learning Theory, PMLR 40:1141-1154, 2015.

Abstract

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

Cite this Paper

BibTeX


@InProceedings{pmlr-v40-Komiyama15,
  title = 	 {Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem},
  author = 	 {Komiyama, Junpei and Honda, Junya and Kashima, Hisashi and Nakagawa, Hiroshi},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {1141--1154},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Komiyama15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Komiyama15.html},
  abstract = 	 {We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.}
}

Endnote

%0 Conference Paper
%T Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
%A Junpei Komiyama
%A Junya Honda
%A Hisashi Kashima
%A Hiroshi Nakagawa
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Komiyama15
%I PMLR
%P 1141--1154
%U https://proceedings.mlr.press/v40/Komiyama15.html
%V 40
%X We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

RIS


TY  - CPAPER
TI  - Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
AU  - Junpei Komiyama
AU  - Junya Honda
AU  - Hisashi Kashima
AU  - Hiroshi Nakagawa
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Komiyama15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 1141
EP  - 1154
L1  - http://proceedings.mlr.press/v40/Komiyama15.pdf
UR  - https://proceedings.mlr.press/v40/Komiyama15.html
AB  - We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.
ER  -

APA


Komiyama, J., Honda, J., Kashima, H. & Nakagawa, H.. (2015). Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1141-1154 Available from https://proceedings.mlr.press/v40/Komiyama15.html.

Related Material

Download PDF