Minimax-Bayes Reinforcement Learning

Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, Emilio Jorge
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7511-7527, 2023.

Abstract

While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-buening23a, title = {Minimax-Bayes Reinforcement Learning}, author = {Buening, Thomas Kleine and Dimitrakakis, Christos and Eriksson, Hannes and Grover, Divya and Jorge, Emilio}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7511--7527}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/buening23a/buening23a.pdf}, url = {https://proceedings.mlr.press/v206/buening23a.html}, abstract = {While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.} }
Endnote
%0 Conference Paper %T Minimax-Bayes Reinforcement Learning %A Thomas Kleine Buening %A Christos Dimitrakakis %A Hannes Eriksson %A Divya Grover %A Emilio Jorge %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-buening23a %I PMLR %P 7511--7527 %U https://proceedings.mlr.press/v206/buening23a.html %V 206 %X While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution. One idea is to employ a worst-case prior. However, this is not as easy to specify in sequential decision making as in simple statistical estimation problems. This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems to gain insights into the properties of the corresponding priors and policies. We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
APA
Buening, T.K., Dimitrakakis, C., Eriksson, H., Grover, D. & Jorge, E.. (2023). Minimax-Bayes Reinforcement Learning. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7511-7527 Available from https://proceedings.mlr.press/v206/buening23a.html.

Related Material