Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

Junya Honda; Akimichi Takemura

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

Junya Honda, Akimichi Takemura

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR 33:375-383, 2014.

Abstract

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental examples of multiparameter models. First we prove that the expected regret of TS with the uniform prior achieves the theoretical bound, which is the first result to show that the asymptotic bound is achievable for the normal distribution model. Next we prove that TS with Jeffreys prior and reference prior cannot achieve the theoretical bound. Therefore choice of priors is important for TS and non-informative priors are sometimes risky in cases of multiparameter models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v33-honda14,
  title = 	 {{Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors}},
  author = 	 {Honda, Junya and Takemura, Akimichi},
  booktitle = 	 {Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {375--383},
  year = 	 {2014},
  editor = 	 {Kaski, Samuel and Corander, Jukka},
  volume = 	 {33},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Reykjavik, Iceland},
  month = 	 {22--25 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v33/honda14.pdf},
  url = 	 {https://proceedings.mlr.press/v33/honda14.html},
  abstract = 	 {In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental examples of multiparameter models. First we prove that the expected regret of TS with the uniform prior achieves the theoretical bound, which is the first result to show that the asymptotic bound is achievable for the normal distribution model. Next we prove that TS with Jeffreys prior and reference prior cannot achieve the theoretical bound. Therefore choice of priors is important for TS and non-informative priors are sometimes risky in cases of multiparameter models.}
}

Endnote

%0 Conference Paper
%T Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
%A Junya Honda
%A Akimichi Takemura
%B Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2014
%E Samuel Kaski
%E Jukka Corander	
%F pmlr-v33-honda14
%I PMLR
%P 375--383
%U https://proceedings.mlr.press/v33/honda14.html
%V 33
%X In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental examples of multiparameter models. First we prove that the expected regret of TS with the uniform prior achieves the theoretical bound, which is the first result to show that the asymptotic bound is achievable for the normal distribution model. Next we prove that TS with Jeffreys prior and reference prior cannot achieve the theoretical bound. Therefore choice of priors is important for TS and non-informative priors are sometimes risky in cases of multiparameter models.

RIS


TY  - CPAPER
TI  - Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
AU  - Junya Honda
AU  - Akimichi Takemura
BT  - Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics
DA  - 2014/04/02
ED  - Samuel Kaski
ED  - Jukka Corander	
ID  - pmlr-v33-honda14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 33
SP  - 375
EP  - 383
L1  - http://proceedings.mlr.press/v33/honda14.pdf
UR  - https://proceedings.mlr.press/v33/honda14.html
AB  - In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental examples of multiparameter models. First we prove that the expected regret of TS with the uniform prior achieves the theoretical bound, which is the first result to show that the asymptotic bound is achievable for the normal distribution model. Next we prove that TS with Jeffreys prior and reference prior cannot achieve the theoretical bound. Therefore choice of priors is important for TS and non-informative priors are sometimes risky in cases of multiparameter models.
ER  -

APA


Honda, J. & Takemura, A.. (2014). Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 33:375-383 Available from https://proceedings.mlr.press/v33/honda14.html.

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

Abstract

Cite this Paper

Related Material