Planning in Reward-Rich Domains via PAC Bandits

Sergiu Goschin; Ari Weinstein; Michael L. Littman; Erick Chastain

Planning in Reward-Rich Domains via PAC Bandits

Sergiu Goschin, Ari Weinstein, Michael L. Littman, Erick Chastain

Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:25-42, 2013.

Abstract

In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.

Cite this Paper

BibTeX


@InProceedings{pmlr-v24-goschin12a,
  title = 	 {Planning in Reward-Rich Domains via PAC Bandits},
  author = 	 {Goschin, Sergiu and Weinstein, Ari and Littman, Michael L. and Chastain, Erick},
  booktitle = 	 {Proceedings of the Tenth European Workshop on Reinforcement Learning},
  pages = 	 {25--42},
  year = 	 {2013},
  editor = 	 {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan},
  volume = 	 {24},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Edinburgh, Scotland},
  month = 	 {30 Jun--01 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v24/goschin12a/goschin12a.pdf},
  url = 	 {https://proceedings.mlr.press/v24/goschin12a.html},
  abstract = 	 {In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.}
}

Endnote

%0 Conference Paper
%T Planning in Reward-Rich Domains via PAC Bandits
%A Sergiu Goschin
%A Ari Weinstein
%A Michael L. Littman
%A Erick Chastain
%B Proceedings of the Tenth European Workshop on Reinforcement Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Marc Peter Deisenroth
%E Csaba Szepesvári
%E Jan Peters	
%F pmlr-v24-goschin12a
%I PMLR
%P 25--42
%U https://proceedings.mlr.press/v24/goschin12a.html
%V 24
%X In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.

RIS


TY  - CPAPER
TI  - Planning in Reward-Rich Domains via PAC Bandits
AU  - Sergiu Goschin
AU  - Ari Weinstein
AU  - Michael L. Littman
AU  - Erick Chastain
BT  - Proceedings of the Tenth European Workshop on Reinforcement Learning
DA  - 2013/01/12
ED  - Marc Peter Deisenroth
ED  - Csaba Szepesvári
ED  - Jan Peters	
ID  - pmlr-v24-goschin12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 24
SP  - 25
EP  - 42
L1  - http://proceedings.mlr.press/v24/goschin12a/goschin12a.pdf
UR  - https://proceedings.mlr.press/v24/goschin12a.html
AB  - In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.
ER  -

APA


Goschin, S., Weinstein, A., Littman, M.L. & Chastain, E.. (2013). Planning in Reward-Rich Domains via PAC Bandits. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:25-42 Available from https://proceedings.mlr.press/v24/goschin12a.html.

Related Material

Download PDF