Bayesian Reinforcement Learning via Deep, Sparse Sampling

Divya Grover; Debabrota Basu; Christos Dimitrakakis

Bayesian Reinforcement Learning via Deep, Sparse Sampling

Divya Grover, Debabrota Basu, Christos Dimitrakakis

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3036-3045, 2020.

Abstract

We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.

Cite this Paper

BibTeX

@InProceedings{pmlr-v108-grover20a,
  title = 	 {Bayesian Reinforcement Learning via Deep, Sparse Sampling},
  author =       {Grover, Divya and Basu, Debabrota and Dimitrakakis, Christos},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3036--3045},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/grover20a/grover20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/grover20a.html},
  abstract = 	 { We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.}
}

Endnote

%0 Conference Paper
%T Bayesian Reinforcement Learning via Deep, Sparse Sampling
%A Divya Grover
%A Debabrota Basu
%A Christos Dimitrakakis
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-grover20a
%I PMLR
%P 3036--3045
%U https://proceedings.mlr.press/v108/grover20a.html
%V 108
%X  We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.

APA

Grover, D., Basu, D. & Dimitrakakis, C.. (2020). Bayesian Reinforcement Learning via Deep, Sparse Sampling. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3036-3045 Available from https://proceedings.mlr.press/v108/grover20a.html.

Bayesian Reinforcement Learning via Deep, Sparse Sampling

Abstract

Cite this Paper

Related Material