Planning in Reward-Rich Domains via PAC Bandits

Sergiu Goschin, Ari Weinstein, Michael L. Littman, Erick Chastain
Proceedings of the Tenth European Workshop on Reinforcement Learning, PMLR 24:25-42, 2013.

Abstract

In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.

Cite this Paper


BibTeX
@InProceedings{pmlr-v24-goschin12a, title = {Planning in Reward-Rich Domains via PAC Bandits}, author = {Goschin, Sergiu and Weinstein, Ari and Littman, Michael L. and Chastain, Erick}, booktitle = {Proceedings of the Tenth European Workshop on Reinforcement Learning}, pages = {25--42}, year = {2013}, editor = {Deisenroth, Marc Peter and Szepesvári, Csaba and Peters, Jan}, volume = {24}, series = {Proceedings of Machine Learning Research}, address = {Edinburgh, Scotland}, month = {30 Jun--01 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v24/goschin12a/goschin12a.pdf}, url = {https://proceedings.mlr.press/v24/goschin12a.html}, abstract = {In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.} }
Endnote
%0 Conference Paper %T Planning in Reward-Rich Domains via PAC Bandits %A Sergiu Goschin %A Ari Weinstein %A Michael L. Littman %A Erick Chastain %B Proceedings of the Tenth European Workshop on Reinforcement Learning %C Proceedings of Machine Learning Research %D 2013 %E Marc Peter Deisenroth %E Csaba Szepesvári %E Jan Peters %F pmlr-v24-goschin12a %I PMLR %P 25--42 %U https://proceedings.mlr.press/v24/goschin12a.html %V 24 %X In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known.
RIS
TY - CPAPER TI - Planning in Reward-Rich Domains via PAC Bandits AU - Sergiu Goschin AU - Ari Weinstein AU - Michael L. Littman AU - Erick Chastain BT - Proceedings of the Tenth European Workshop on Reinforcement Learning DA - 2013/01/12 ED - Marc Peter Deisenroth ED - Csaba Szepesvári ED - Jan Peters ID - pmlr-v24-goschin12a PB - PMLR DP - Proceedings of Machine Learning Research VL - 24 SP - 25 EP - 42 L1 - http://proceedings.mlr.press/v24/goschin12a/goschin12a.pdf UR - https://proceedings.mlr.press/v24/goschin12a.html AB - In some decision-making environments, successful solutions are common. If the evaluation of candidate solutions is noisy, however, the challenge is knowing when a “good enough” answer has been found. We formalize this problem as an infinite-armed bandit and provide upper and lower bounds on the number of evaluations or “pulls” needed to identify a solution whose evaluation exceeds a given threshold r0 . We present several algorithms and use them to identify reliable strategies for solving screens from the video games \emphInfinite Mario and \emphPitfall! We show order of magnitude improvements in sample complexity over a natural approach that pulls each arm until a good estimate of its success probability is known. ER -
APA
Goschin, S., Weinstein, A., Littman, M.L. & Chastain, E.. (2013). Planning in Reward-Rich Domains via PAC Bandits. Proceedings of the Tenth European Workshop on Reinforcement Learning, in Proceedings of Machine Learning Research 24:25-42 Available from https://proceedings.mlr.press/v24/goschin12a.html.

Related Material