Multi-objective Monte-Carlo Tree Search

Weijia Wang, Michèle Sebag
Proceedings of the Asian Conference on Machine Learning, PMLR 25:507-522, 2012.

Abstract

Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v25-wang12b, title = {Multi-objective {M}onte-{C}arlo Tree Search}, author = {Wang, Weijia and Sebag, Michèle}, booktitle = {Proceedings of the Asian Conference on Machine Learning}, pages = {507--522}, year = {2012}, editor = {Hoi, Steven C. H. and Buntine, Wray}, volume = {25}, series = {Proceedings of Machine Learning Research}, address = {Singapore Management University, Singapore}, month = {04--06 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v25/wang12b/wang12b.pdf}, url = {https://proceedings.mlr.press/v25/wang12b.html}, abstract = {Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.} }
Endnote
%0 Conference Paper %T Multi-objective Monte-Carlo Tree Search %A Weijia Wang %A Michèle Sebag %B Proceedings of the Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2012 %E Steven C. H. Hoi %E Wray Buntine %F pmlr-v25-wang12b %I PMLR %P 507--522 %U https://proceedings.mlr.press/v25/wang12b.html %V 25 %X Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.
RIS
TY - CPAPER TI - Multi-objective Monte-Carlo Tree Search AU - Weijia Wang AU - Michèle Sebag BT - Proceedings of the Asian Conference on Machine Learning DA - 2012/11/17 ED - Steven C. H. Hoi ED - Wray Buntine ID - pmlr-v25-wang12b PB - PMLR DP - Proceedings of Machine Learning Research VL - 25 SP - 507 EP - 522 L1 - http://proceedings.mlr.press/v25/wang12b/wang12b.pdf UR - https://proceedings.mlr.press/v25/wang12b.html AB - Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost. ER -
APA
Wang, W. & Sebag, M.. (2012). Multi-objective Monte-Carlo Tree Search. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 25:507-522 Available from https://proceedings.mlr.press/v25/wang12b.html.

Related Material