A Workflow for Offline Model-Free Robotic Reinforcement Learning

Aviral Kumar, Anikait Singh, Stephen Tian, Chelsea Finn, Sergey Levine
Proceedings of the 5th Conference on Robot Learning, PMLR 164:417-428, 2022.

Abstract

Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model ar- architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional content can be found at https://sites.google.com/view/offline-rl-workflow.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-kumar22a, title = {A Workflow for Offline Model-Free Robotic Reinforcement Learning}, author = {Kumar, Aviral and Singh, Anikait and Tian, Stephen and Finn, Chelsea and Levine, Sergey}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {417--428}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/kumar22a/kumar22a.pdf}, url = {https://proceedings.mlr.press/v164/kumar22a.html}, abstract = {Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model ar- architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional content can be found at https://sites.google.com/view/offline-rl-workflow. } }
Endnote
%0 Conference Paper %T A Workflow for Offline Model-Free Robotic Reinforcement Learning %A Aviral Kumar %A Anikait Singh %A Stephen Tian %A Chelsea Finn %A Sergey Levine %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-kumar22a %I PMLR %P 417--428 %U https://proceedings.mlr.press/v164/kumar22a.html %V 164 %X Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model ar- architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional content can be found at https://sites.google.com/view/offline-rl-workflow.
APA
Kumar, A., Singh, A., Tian, S., Finn, C. & Levine, S.. (2022). A Workflow for Offline Model-Free Robotic Reinforcement Learning. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:417-428 Available from https://proceedings.mlr.press/v164/kumar22a.html.

Related Material