On Bayesian Upper Confidence Bounds for Bandit Problems

Emilie Kaufmann; Olivier Cappe; Aurelien Garivier

On Bayesian Upper Confidence Bounds for Bandit Problems

Emilie Kaufmann, Olivier Cappe, Aurelien Garivier

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:592-600, 2012.

Abstract

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality. More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-kaufmann12,
  title = 	 {On Bayesian Upper Confidence Bounds for Bandit Problems},
  author = 	 {Kaufmann, Emilie and Cappe, Olivier and Garivier, Aurelien},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {592--600},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/kaufmann12/kaufmann12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/kaufmann12.html},
  abstract = 	 {Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution.  We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality.  More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.}
}

Endnote

%0 Conference Paper
%T On Bayesian Upper Confidence Bounds for Bandit Problems
%A Emilie Kaufmann
%A Olivier Cappe
%A Aurelien Garivier
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-kaufmann12
%I PMLR
%P 592--600
%U https://proceedings.mlr.press/v22/kaufmann12.html
%V 22
%X Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution.  We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality.  More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.

RIS


TY  - CPAPER
TI  - On Bayesian Upper Confidence Bounds for Bandit Problems
AU  - Emilie Kaufmann
AU  - Olivier Cappe
AU  - Aurelien Garivier
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-kaufmann12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 592
EP  - 600
L1  - http://proceedings.mlr.press/v22/kaufmann12/kaufmann12.pdf
UR  - https://proceedings.mlr.press/v22/kaufmann12.html
AB  - Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution.  We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of performance. We give a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. For binary bandits, we prove that the corresponding algorithm, termed Bayes-UCB, satisfies finite-time regret bounds that imply its asymptotic optimality.  More generally, Bayes-UCB appears as an unifying framework for several variants of the UCB algorithm addressing different bandit problems (parametric multi-armed bandits, Gaussian bandits with unknown mean and variance, linear bandits). But the generality of the Bayesian approach makes it possible to address more challenging models. In particular, we show how to handle linear bandits with sparsity constraints by resorting to Gibbs sampling.
ER  -

APA


Kaufmann, E., Cappe, O. & Garivier, A.. (2012). On Bayesian Upper Confidence Bounds for Bandit Problems. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:592-600 Available from https://proceedings.mlr.press/v22/kaufmann12.html.

Related Material

Download PDF