Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt; Frans A. Oliehoek; Christopher Amato

Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt, Frans A. Oliehoek, Christopher Amato

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1819-1827, 2017.

Abstract

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-katt17a,
  title = 	 {Learning in {POMDP}s with {M}onte {C}arlo Tree Search},
  author =       {Sammie Katt and Frans A. Oliehoek and Christopher Amato},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {1819--1827},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/katt17a/katt17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/katt17a.html},
  abstract = 	 {The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.}
}

Endnote

%0 Conference Paper
%T Learning in POMDPs with Monte Carlo Tree Search
%A Sammie Katt
%A Frans A. Oliehoek
%A Christopher Amato
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-katt17a
%I PMLR
%P 1819--1827
%U https://proceedings.mlr.press/v70/katt17a.html
%V 70
%X The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

APA


Katt, S., Oliehoek, F.A. & Amato, C.. (2017). Learning in POMDPs with Monte Carlo Tree Search. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:1819-1827 Available from https://proceedings.mlr.press/v70/katt17a.html.

Learning in POMDPs with Monte Carlo Tree Search

Abstract

Cite this Paper

Related Material