A Stochastic Bandit Algorithm for Scratch Games

Raphaël Féraud; Tanguy Urvoy

A Stochastic Bandit Algorithm for Scratch Games

Raphaël Féraud, Tanguy Urvoy

Proceedings of the Asian Conference on Machine Learning, PMLR 25:129-143, 2012.

Abstract

Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).

Cite this Paper

BibTeX


@InProceedings{pmlr-v25-feraud12,
  title = 	 {A Stochastic Bandit Algorithm for Scratch Games},
  author = 	 {Féraud, Raphaël and Urvoy, Tanguy},
  booktitle = 	 {Proceedings of the Asian Conference on Machine Learning},
  pages = 	 {129--143},
  year = 	 {2012},
  editor = 	 {Hoi, Steven C. H. and Buntine, Wray},
  volume = 	 {25},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Singapore Management University, Singapore},
  month = 	 {04--06 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v25/feraud12/feraud12.pdf},
  url = 	 {https://proceedings.mlr.press/v25/feraud12.html},
  abstract = 	 {Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).}
}

Endnote

%0 Conference Paper
%T A Stochastic Bandit Algorithm for Scratch Games
%A Raphaël Féraud
%A Tanguy Urvoy
%B Proceedings of the Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2012
%E Steven C. H. Hoi
%E Wray Buntine	
%F pmlr-v25-feraud12
%I PMLR
%P 129--143
%U https://proceedings.mlr.press/v25/feraud12.html
%V 25
%X Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).

RIS


TY  - CPAPER
TI  - A Stochastic Bandit Algorithm for Scratch Games
AU  - Raphaël Féraud
AU  - Tanguy Urvoy
BT  - Proceedings of the Asian Conference on Machine Learning
DA  - 2012/11/17
ED  - Steven C. H. Hoi
ED  - Wray Buntine	
ID  - pmlr-v25-feraud12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 25
SP  - 129
EP  - 143
L1  - http://proceedings.mlr.press/v25/feraud12/feraud12.pdf
UR  - https://proceedings.mlr.press/v25/feraud12.html
AB  - Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called "scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).
ER  -

APA


Féraud, R. & Urvoy, T.. (2012). A Stochastic Bandit Algorithm for Scratch Games. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 25:129-143 Available from https://proceedings.mlr.press/v25/feraud12.html.

Related Material

Download PDF