Multi-objective Monte-Carlo Tree Search

Weijia Wang; Michèle Sebag

Multi-objective Monte-Carlo Tree Search

Weijia Wang, Michèle Sebag

Proceedings of the Asian Conference on Machine Learning, PMLR 25:507-522, 2012.

Abstract

Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.

Cite this Paper

BibTeX


@InProceedings{pmlr-v25-wang12b,
  title = 	 {Multi-objective {M}onte-{C}arlo Tree Search},
  author = 	 {Wang, Weijia and Sebag, Michèle},
  booktitle = 	 {Proceedings of the Asian Conference on Machine Learning},
  pages = 	 {507--522},
  year = 	 {2012},
  editor = 	 {Hoi, Steven C. H. and Buntine, Wray},
  volume = 	 {25},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Singapore Management University, Singapore},
  month = 	 {04--06 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v25/wang12b/wang12b.pdf},
  url = 	 {https://proceedings.mlr.press/v25/wang12b.html},
  abstract = 	 {Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.}
}

Endnote

%0 Conference Paper
%T Multi-objective Monte-Carlo Tree Search
%A Weijia Wang
%A Michèle Sebag
%B Proceedings of the Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2012
%E Steven C. H. Hoi
%E Wray Buntine	
%F pmlr-v25-wang12b
%I PMLR
%P 507--522
%U https://proceedings.mlr.press/v25/wang12b.html
%V 25
%X Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.

RIS


TY  - CPAPER
TI  - Multi-objective Monte-Carlo Tree Search
AU  - Weijia Wang
AU  - Michèle Sebag
BT  - Proceedings of the Asian Conference on Machine Learning
DA  - 2012/11/17
ED  - Steven C. H. Hoi
ED  - Wray Buntine	
ID  - pmlr-v25-wang12b
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 25
SP  - 507
EP  - 522
L1  - http://proceedings.mlr.press/v25/wang12b/wang12b.pdf
UR  - https://proceedings.mlr.press/v25/wang12b.html
AB  - Concerned with multi-objective reinforcement learning (MORL), this paper presents MO-MCTS, an extension of Monte-Carlo Tree Search to multi-objective sequential decision making. The known multi-objective indicator referred to as hyper-volume indicator is used to define an action selection criterion, replacing the UCB criterion in order to deal with multi-dimensional rewards. MO-MCTS is firstly compared with an existing MORL algorithm on the artificial Deep Sea Treasure problem. Then a scalability study of MO-MCTS is made on the NP-hard problem of grid scheduling, showing that the performance of MO-MCTS matches the non RL-based state of the art albeit with a higher computational cost.
ER  -

APA


Wang, W. & Sebag, M.. (2012). Multi-objective Monte-Carlo Tree Search. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 25:507-522 Available from https://proceedings.mlr.press/v25/wang12b.html.

Related Material

Download PDF