Further Optimal Regret Bounds for Thompson Sampling

Shipra Agrawal; Navin Goyal

Further Optimal Regret Bounds for Thompson Sampling

Shipra Agrawal, Navin Goyal

Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:99-107, 2013.

Abstract

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O(\sqrtNT\ln N) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω(\sqrtNT\ln N) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O(\sqrtNT) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ε)\sum_i \frac\ln Td(\mu_i, \mu_1)+O(\fracNε^2) on the expected regret. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012].

Cite this Paper

BibTeX


@InProceedings{pmlr-v31-agrawal13a,
  title = 	 {Further Optimal Regret Bounds for Thompson Sampling},
  author = 	 {Agrawal, Shipra and Goyal, Navin},
  booktitle = 	 {Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {99--107},
  year = 	 {2013},
  editor = 	 {Carvalho, Carlos M. and Ravikumar, Pradeep},
  volume = 	 {31},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Scottsdale, Arizona, USA},
  month = 	 {29 Apr--01 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v31/agrawal13a.pdf},
  url = 	 {https://proceedings.mlr.press/v31/agrawal13a.html},
  abstract = 	 {Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent  bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O(\sqrtNT\ln N) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω(\sqrtNT\ln N) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O(\sqrtNT) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ε)\sum_i \frac\ln Td(\mu_i, \mu_1)+O(\fracNε^2) on the expected regret.  The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012].}
}

Endnote

%0 Conference Paper
%T Further Optimal Regret Bounds for Thompson Sampling
%A Shipra Agrawal
%A Navin Goyal
%B Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2013
%E Carlos M. Carvalho
%E Pradeep Ravikumar	
%F pmlr-v31-agrawal13a
%I PMLR
%P 99--107
%U https://proceedings.mlr.press/v31/agrawal13a.html
%V 31
%X Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent  bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O(\sqrtNT\ln N) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω(\sqrtNT\ln N) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O(\sqrtNT) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ε)\sum_i \frac\ln Td(\mu_i, \mu_1)+O(\fracNε^2) on the expected regret.  The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012].

RIS


TY  - CPAPER
TI  - Further Optimal Regret Bounds for Thompson Sampling
AU  - Shipra Agrawal
AU  - Navin Goyal
BT  - Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics
DA  - 2013/04/29
ED  - Carlos M. Carvalho
ED  - Pradeep Ravikumar	
ID  - pmlr-v31-agrawal13a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 31
SP  - 99
EP  - 107
L1  - http://proceedings.mlr.press/v31/agrawal13a.pdf
UR  - https://proceedings.mlr.press/v31/agrawal13a.html
AB  - Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have comparable or better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that proves the first near-optimal problem-independent  bound of O(\sqrtNT\ln T) on the expected regret of this algorithm. Our novel martingale-based analysis techniques are conceptually simple, and easily extend to distributions other than the Beta distribution. For the version of Thompson Sampling that uses Gaussian priors, we prove a problem-independent bound of O(\sqrtNT\ln N) on the expected regret, and demonstrate the optimality of this bound by providing a matching lower bound. This lower bound of Ω(\sqrtNT\ln N) is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of O(\sqrtNT) for the multi-armed bandit problem. Our near-optimal problem-independent bounds for Thompson Sampling solve a COLT 2012 open problem of Chapelle and Li. Additionally, our techniques simultaneously provide the optimal problem-dependent bound of (1+ε)\sum_i \frac\ln Td(\mu_i, \mu_1)+O(\fracNε^2) on the expected regret.  The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [2012].
ER  -

APA


Agrawal, S. & Goyal, N.. (2013). Further Optimal Regret Bounds for Thompson Sampling. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 31:99-107 Available from https://proceedings.mlr.press/v31/agrawal13a.html.

Further Optimal Regret Bounds for Thompson Sampling

Abstract

Cite this Paper

Related Material