Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa
Proceedings of The 28th Conference on Learning Theory, PMLR 40:1141-1154, 2015.

Abstract

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v40-Komiyama15, title = {Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem}, author = {Komiyama, Junpei and Honda, Junya and Kashima, Hisashi and Nakagawa, Hiroshi}, booktitle = {Proceedings of The 28th Conference on Learning Theory}, pages = {1141--1154}, year = {2015}, editor = {Grünwald, Peter and Hazan, Elad and Kale, Satyen}, volume = {40}, series = {Proceedings of Machine Learning Research}, address = {Paris, France}, month = {03--06 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v40/Komiyama15.pdf}, url = {https://proceedings.mlr.press/v40/Komiyama15.html}, abstract = {We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.} }
Endnote
%0 Conference Paper %T Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem %A Junpei Komiyama %A Junya Honda %A Hisashi Kashima %A Hiroshi Nakagawa %B Proceedings of The 28th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2015 %E Peter Grünwald %E Elad Hazan %E Satyen Kale %F pmlr-v40-Komiyama15 %I PMLR %P 1141--1154 %U https://proceedings.mlr.press/v40/Komiyama15.html %V 40 %X We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.
RIS
TY - CPAPER TI - Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem AU - Junpei Komiyama AU - Junya Honda AU - Hisashi Kashima AU - Hiroshi Nakagawa BT - Proceedings of The 28th Conference on Learning Theory DA - 2015/06/26 ED - Peter Grünwald ED - Elad Hazan ED - Satyen Kale ID - pmlr-v40-Komiyama15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 40 SP - 1141 EP - 1154 L1 - http://proceedings.mlr.press/v40/Komiyama15.pdf UR - https://proceedings.mlr.press/v40/Komiyama15.html AB - We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones. ER -
APA
Komiyama, J., Honda, J., Kashima, H. & Nakagawa, H.. (2015). Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:1141-1154 Available from https://proceedings.mlr.press/v40/Komiyama15.html.

Related Material