Bayesian Reinforcement Learning via Deep, Sparse Sampling

Divya Grover, Debabrota Basu, Christos Dimitrakakis
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3036-3045, 2020.

Abstract

We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-grover20a, title = {Bayesian Reinforcement Learning via Deep, Sparse Sampling}, author = {Grover, Divya and Basu, Debabrota and Dimitrakakis, Christos}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {3036--3045}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/grover20a/grover20a.pdf}, url = {http://proceedings.mlr.press/v108/grover20a.html}, abstract = { We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.} }
Endnote
%0 Conference Paper %T Bayesian Reinforcement Learning via Deep, Sparse Sampling %A Divya Grover %A Debabrota Basu %A Christos Dimitrakakis %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-grover20a %I PMLR %P 3036--3045 %U http://proceedings.mlr.press/v108/grover20a.html %V 108 %X We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal as well as lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward over time in discrete environments.
APA
Grover, D., Basu, D. & Dimitrakakis, C.. (2020). Bayesian Reinforcement Learning via Deep, Sparse Sampling. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3036-3045 Available from http://proceedings.mlr.press/v108/grover20a.html.

Related Material