Learning and Planning in Complex Action Spaces

Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4476-4486, 2021.

Abstract

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-hubert21a, title = {Learning and Planning in Complex Action Spaces}, author = {Hubert, Thomas and Schrittwieser, Julian and Antonoglou, Ioannis and Barekatain, Mohammadamin and Schmitt, Simon and Silver, David}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4476--4486}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/hubert21a/hubert21a.pdf}, url = {http://proceedings.mlr.press/v139/hubert21a.html}, abstract = {Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.} }
Endnote
%0 Conference Paper %T Learning and Planning in Complex Action Spaces %A Thomas Hubert %A Julian Schrittwieser %A Ioannis Antonoglou %A Mohammadamin Barekatain %A Simon Schmitt %A David Silver %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-hubert21a %I PMLR %P 4476--4486 %U http://proceedings.mlr.press/v139/hubert21a.html %V 139 %X Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.
APA
Hubert, T., Schrittwieser, J., Antonoglou, I., Barekatain, M., Schmitt, S. & Silver, D.. (2021). Learning and Planning in Complex Action Spaces. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4476-4486 Available from http://proceedings.mlr.press/v139/hubert21a.html.

Related Material