Continuous Rapid Action Value Estimates

Adrien Couëtoux, Mario Milone, Mátyás Brendel, Hassan Doghmen, Michèle Sebag, Olivier Teytaud
Proceedings of the Asian Conference on Machine Learning, PMLR 20:19-31, 2011.

Abstract

In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v20-couetoux11, title = {Continuous Rapid Action Value Estimates}, author = {Couëtoux, Adrien and Milone, Mario and Brendel, Mátyás and Doghmen, Hassan and Sebag, Michèle and Teytaud, Olivier}, booktitle = {Proceedings of the Asian Conference on Machine Learning}, pages = {19--31}, year = {2011}, editor = {Hsu, Chun-Nan and Lee, Wee Sun}, volume = {20}, series = {Proceedings of Machine Learning Research}, address = {South Garden Hotels and Resorts, Taoyuan, Taiwain}, month = {14--15 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v20/couetoux11/couetoux11.pdf}, url = {https://proceedings.mlr.press/v20/couetoux11.html}, abstract = {In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.} }
Endnote
%0 Conference Paper %T Continuous Rapid Action Value Estimates %A Adrien Couëtoux %A Mario Milone %A Mátyás Brendel %A Hassan Doghmen %A Michèle Sebag %A Olivier Teytaud %B Proceedings of the Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2011 %E Chun-Nan Hsu %E Wee Sun Lee %F pmlr-v20-couetoux11 %I PMLR %P 19--31 %U https://proceedings.mlr.press/v20/couetoux11.html %V 20 %X In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.
RIS
TY - CPAPER TI - Continuous Rapid Action Value Estimates AU - Adrien Couëtoux AU - Mario Milone AU - Mátyás Brendel AU - Hassan Doghmen AU - Michèle Sebag AU - Olivier Teytaud BT - Proceedings of the Asian Conference on Machine Learning DA - 2011/11/17 ED - Chun-Nan Hsu ED - Wee Sun Lee ID - pmlr-v20-couetoux11 PB - PMLR DP - Proceedings of Machine Learning Research VL - 20 SP - 19 EP - 31 L1 - http://proceedings.mlr.press/v20/couetoux11/couetoux11.pdf UR - https://proceedings.mlr.press/v20/couetoux11.html AB - In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem. ER -
APA
Couëtoux, A., Milone, M., Brendel, M., Doghmen, H., Sebag, M. & Teytaud, O.. (2011). Continuous Rapid Action Value Estimates. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 20:19-31 Available from https://proceedings.mlr.press/v20/couetoux11.html.

Related Material