On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search

Piyush Khandelwal; Elad Liebman; Scott Niekum; Peter Stone

On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search

Piyush Khandelwal, Elad Liebman, Scott Niekum, Peter Stone

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1319-1328, 2016.

Abstract

Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-khandelwal16,
  title = 	 {On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search},
  author = 	 {Khandelwal, Piyush and Liebman, Elad and Niekum, Scott and Stone, Peter},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1319--1328},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/khandelwal16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/khandelwal16.html},
  abstract = 	 {Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.}
}

Endnote

%0 Conference Paper
%T On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search
%A Piyush Khandelwal
%A Elad Liebman
%A Scott Niekum
%A Peter Stone
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-khandelwal16
%I PMLR
%P 1319--1328
%U https://proceedings.mlr.press/v48/khandelwal16.html
%V 48
%X Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.

RIS


TY  - CPAPER
TI  - On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search
AU  - Piyush Khandelwal
AU  - Elad Liebman
AU  - Scott Niekum
AU  - Peter Stone
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-khandelwal16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1319
EP  - 1328
L1  - http://proceedings.mlr.press/v48/khandelwal16.pdf
UR  - https://proceedings.mlr.press/v48/khandelwal16.html
AB  - Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.
ER  -

APA


Khandelwal, P., Liebman, E., Niekum, S. & Stone, P.. (2016). On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1319-1328 Available from https://proceedings.mlr.press/v48/khandelwal16.html.

Related Material

Download PDF