Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Junpei Komiyama, Junya Honda, Hiroshi Nakagawa
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1152-1161, 2015.

Abstract

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al.\,(1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v37-komiyama15, title = {Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays}, author = {Komiyama, Junpei and Honda, Junya and Nakagawa, Hiroshi}, booktitle = {Proceedings of the 32nd International Conference on Machine Learning}, pages = {1152--1161}, year = {2015}, editor = {Bach, Francis and Blei, David}, volume = {37}, series = {Proceedings of Machine Learning Research}, address = {Lille, France}, month = {07--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v37/komiyama15.pdf}, url = {https://proceedings.mlr.press/v37/komiyama15.html}, abstract = {We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al.\,(1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.} }
Endnote
%0 Conference Paper %T Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays %A Junpei Komiyama %A Junya Honda %A Hiroshi Nakagawa %B Proceedings of the 32nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Francis Bach %E David Blei %F pmlr-v37-komiyama15 %I PMLR %P 1152--1161 %U https://proceedings.mlr.press/v37/komiyama15.html %V 37 %X We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al.\,(1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.
RIS
TY - CPAPER TI - Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays AU - Junpei Komiyama AU - Junya Honda AU - Hiroshi Nakagawa BT - Proceedings of the 32nd International Conference on Machine Learning DA - 2015/06/01 ED - Francis Bach ED - David Blei ID - pmlr-v37-komiyama15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 37 SP - 1152 EP - 1161 L1 - http://proceedings.mlr.press/v37/komiyama15.pdf UR - https://proceedings.mlr.press/v37/komiyama15.html AB - We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al.\,(1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance. ER -
APA
Komiyama, J., Honda, J. & Nakagawa, H.. (2015). Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1152-1161 Available from https://proceedings.mlr.press/v37/komiyama15.html.

Related Material