Monte-Carlo Graph Search: the Value of Merging Similar States

Edouard Leurent, Odalric-Ambrym Maillard
; Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:577-592, 2020.

Abstract

We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v129-leurent20a, title = {Monte-Carlo Graph Search: the Value of Merging Similar States}, author = {Leurent, Edouard and Maillard, Odalric-Ambrym}, booktitle = {Proceedings of The 12th Asian Conference on Machine Learning}, pages = {577--592}, year = {2020}, editor = {Sinno Jialin Pan and Masashi Sugiyama}, volume = {129}, series = {Proceedings of Machine Learning Research}, address = {Bangkok, Thailand}, month = {18--20 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v129/leurent20a/leurent20a.pdf}, url = {http://proceedings.mlr.press/v129/leurent20a.html}, abstract = {We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.} }
Endnote
%0 Conference Paper %T Monte-Carlo Graph Search: the Value of Merging Similar States %A Edouard Leurent %A Odalric-Ambrym Maillard %B Proceedings of The 12th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Sinno Jialin Pan %E Masashi Sugiyama %F pmlr-v129-leurent20a %I PMLR %J Proceedings of Machine Learning Research %P 577--592 %U http://proceedings.mlr.press %V 129 %W PMLR %X We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.
APA
Leurent, E. & Maillard, O.. (2020). Monte-Carlo Graph Search: the Value of Merging Similar States. Proceedings of The 12th Asian Conference on Machine Learning, in PMLR 129:577-592

Related Material