Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Philip J Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:619-629, 2021.

Abstract

Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dynamics when transferring a policy to the online setting, where performance can be up to 90% reduced for existing methods. In this paper we address this problem with Augmented World Models (AugWM). We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. We not only train our policy in this new setting, but also provide it with the sampled augmentation as a context, allowing it to adapt to changes in the environment. At test time we learn the context in a self-supervised fashion by approximating the augmentation which corresponds to the new environment. We rigorously evaluate our approach on over 100 different changed dynamics settings, and show that this simple approach can significantly improve the zero-shot generalization of a recent state-of-the-art baseline, often achieving successful policies where the baseline fails.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-ball21a, title = {Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment}, author = {Ball, Philip J and Lu, Cong and Parker-Holder, Jack and Roberts, Stephen}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {619--629}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/ball21a/ball21a.pdf}, url = {https://proceedings.mlr.press/v139/ball21a.html}, abstract = {Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dynamics when transferring a policy to the online setting, where performance can be up to 90% reduced for existing methods. In this paper we address this problem with Augmented World Models (AugWM). We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. We not only train our policy in this new setting, but also provide it with the sampled augmentation as a context, allowing it to adapt to changes in the environment. At test time we learn the context in a self-supervised fashion by approximating the augmentation which corresponds to the new environment. We rigorously evaluate our approach on over 100 different changed dynamics settings, and show that this simple approach can significantly improve the zero-shot generalization of a recent state-of-the-art baseline, often achieving successful policies where the baseline fails.} }
Endnote
%0 Conference Paper %T Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment %A Philip J Ball %A Cong Lu %A Jack Parker-Holder %A Stephen Roberts %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-ball21a %I PMLR %P 619--629 %U https://proceedings.mlr.press/v139/ball21a.html %V 139 %X Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dynamics when transferring a policy to the online setting, where performance can be up to 90% reduced for existing methods. In this paper we address this problem with Augmented World Models (AugWM). We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. We not only train our policy in this new setting, but also provide it with the sampled augmentation as a context, allowing it to adapt to changes in the environment. At test time we learn the context in a self-supervised fashion by approximating the augmentation which corresponds to the new environment. We rigorously evaluate our approach on over 100 different changed dynamics settings, and show that this simple approach can significantly improve the zero-shot generalization of a recent state-of-the-art baseline, often achieving successful policies where the baseline fails.
APA
Ball, P.J., Lu, C., Parker-Holder, J. & Roberts, S.. (2021). Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:619-629 Available from https://proceedings.mlr.press/v139/ball21a.html.

Related Material