Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Maria Elisabeth Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:46323-46350, 2024.

Abstract

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset; containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-springenberg24a, title = {Offline Actor-Critic Reinforcement Learning Scales to Large Models}, author = {Springenberg, Jost Tobias and Abdolmaleki, Abbas and Zhang, Jingwei and Groth, Oliver and Bloesch, Michael and Lampe, Thomas and Brakel, Philemon and Bechtle, Sarah Maria Elisabeth and Kapturowski, Steven and Hafner, Roland and Heess, Nicolas and Riedmiller, Martin}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {46323--46350}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/springenberg24a/springenberg24a.pdf}, url = {https://proceedings.mlr.press/v235/springenberg24a.html}, abstract = {We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset; containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.} }
Endnote
%0 Conference Paper %T Offline Actor-Critic Reinforcement Learning Scales to Large Models %A Jost Tobias Springenberg %A Abbas Abdolmaleki %A Jingwei Zhang %A Oliver Groth %A Michael Bloesch %A Thomas Lampe %A Philemon Brakel %A Sarah Maria Elisabeth Bechtle %A Steven Kapturowski %A Roland Hafner %A Nicolas Heess %A Martin Riedmiller %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-springenberg24a %I PMLR %P 46323--46350 %U https://proceedings.mlr.press/v235/springenberg24a.html %V 235 %X We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset; containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.
APA
Springenberg, J.T., Abdolmaleki, A., Zhang, J., Groth, O., Bloesch, M., Lampe, T., Brakel, P., Bechtle, S.M.E., Kapturowski, S., Hafner, R., Heess, N. & Riedmiller, M.. (2024). Offline Actor-Critic Reinforcement Learning Scales to Large Models. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:46323-46350 Available from https://proceedings.mlr.press/v235/springenberg24a.html.

Related Material