Maximin Action Identification: A New Bandit Framework for Games

Aurélien Garivier; Emilie Kaufmann; Wouter M. Koolen

Maximin Action Identification: A New Bandit Framework for Games

Aurélien Garivier, Emilie Kaufmann, Wouter M. Koolen

29th Annual Conference on Learning Theory, PMLR 49:1028-1050, 2016.

Abstract

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-garivier16b,
  title = 	 {Maximin Action Identification: A New Bandit Framework for Games},
  author = 	 {Garivier, Aurélien and Kaufmann, Emilie and Koolen, Wouter M.},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {1028--1050},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/garivier16b.pdf},
  url = 	 {https://proceedings.mlr.press/v49/garivier16b.html},
  abstract = 	 {We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.}
}

Endnote

%0 Conference Paper
%T Maximin Action Identification: A New Bandit Framework for Games
%A Aurélien Garivier
%A Emilie Kaufmann
%A Wouter M. Koolen
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-garivier16b
%I PMLR
%P 1028--1050
%U https://proceedings.mlr.press/v49/garivier16b.html
%V 49
%X We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

RIS


TY  - CPAPER
TI  - Maximin Action Identification: A New Bandit Framework for Games
AU  - Aurélien Garivier
AU  - Emilie Kaufmann
AU  - Wouter M. Koolen
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-garivier16b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 1028
EP  - 1050
L1  - http://proceedings.mlr.press/v49/garivier16b.pdf
UR  - https://proceedings.mlr.press/v49/garivier16b.html
AB  - We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower- and upper- confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.
ER  -

APA


Garivier, A., Kaufmann, E. & Koolen, W.M.. (2016). Maximin Action Identification: A New Bandit Framework for Games. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:1028-1050 Available from https://proceedings.mlr.press/v49/garivier16b.html.

Related Material

Download PDF