Continuous Rapid Action Value Estimates

Adrien Couëtoux; Mario Milone; Mátyás Brendel; Hassan Doghmen; Michèle Sebag; Olivier Teytaud

Continuous Rapid Action Value Estimates

Adrien Couëtoux, Mario Milone, Mátyás Brendel, Hassan Doghmen, Michèle Sebag, Olivier Teytaud

Proceedings of the Asian Conference on Machine Learning, PMLR 20:19-31, 2011.

Abstract

In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v20-couetoux11,
  title = 	 {Continuous Rapid Action Value Estimates},
  author = 	 {Couëtoux, Adrien and Milone, Mario and Brendel, Mátyás and Doghmen, Hassan and Sebag, Michèle and Teytaud, Olivier},
  booktitle = 	 {Proceedings of the Asian Conference on Machine Learning},
  pages = 	 {19--31},
  year = 	 {2011},
  editor = 	 {Hsu, Chun-Nan and Lee, Wee Sun},
  volume = 	 {20},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {South Garden Hotels and Resorts, Taoyuan, Taiwain},
  month = 	 {14--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v20/couetoux11/couetoux11.pdf},
  url = 	 {https://proceedings.mlr.press/v20/couetoux11.html},
  abstract = 	 {In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.}
}

Endnote

%0 Conference Paper
%T Continuous Rapid Action Value Estimates
%A Adrien Couëtoux
%A Mario Milone
%A Mátyás Brendel
%A Hassan Doghmen
%A Michèle Sebag
%A Olivier Teytaud
%B Proceedings of the Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2011
%E Chun-Nan Hsu
%E Wee Sun Lee	
%F pmlr-v20-couetoux11
%I PMLR
%P 19--31
%U https://proceedings.mlr.press/v20/couetoux11.html
%V 20
%X In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.

RIS


TY  - CPAPER
TI  - Continuous Rapid Action Value Estimates
AU  - Adrien Couëtoux
AU  - Mario Milone
AU  - Mátyás Brendel
AU  - Hassan Doghmen
AU  - Michèle Sebag
AU  - Olivier Teytaud
BT  - Proceedings of the Asian Conference on Machine Learning
DA  - 2011/11/17
ED  - Chun-Nan Hsu
ED  - Wee Sun Lee	
ID  - pmlr-v20-couetoux11
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 20
SP  - 19
EP  - 31
L1  - http://proceedings.mlr.press/v20/couetoux11/couetoux11.pdf
UR  - https://proceedings.mlr.press/v20/couetoux11.html
AB  - In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence Tree algorithm to handle the exploration versus exploitation trade-off, while a few heuristics are used to guide the exploration in large search spaces. Among these heuristics is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the RAVE heuristics to continuous action and state spaces. The approach is experimentally validated on two artificial benchmark problems: the treasure hunt game, and a real-world energy management problem.
ER  -

APA


Couëtoux, A., Milone, M., Brendel, M., Doghmen, H., Sebag, M. & Teytaud, O.. (2011). Continuous Rapid Action Value Estimates. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 20:19-31 Available from https://proceedings.mlr.press/v20/couetoux11.html.

Related Material

Download PDF