Decoupling Representation Learning from Reinforcement Learning

Adam Stooke; Kimin Lee; Pieter Abbeel; Michael Laskin

Decoupling Representation Learning from Reinforcement Learning

Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9870-9879, 2021.

Abstract

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at \url{https://github.com/astooke/rlpyt/tree/master/rlpyt/ul}.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-stooke21a,
  title = 	 {Decoupling Representation Learning from Reinforcement Learning},
  author =       {Stooke, Adam and Lee, Kimin and Abbeel, Pieter and Laskin, Michael},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {9870--9879},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/stooke21a/stooke21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/stooke21a.html},
  abstract = 	 {In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at \url{https://github.com/astooke/rlpyt/tree/master/rlpyt/ul}.}
}

Endnote

%0 Conference Paper
%T Decoupling Representation Learning from Reinforcement Learning
%A Adam Stooke
%A Kimin Lee
%A Pieter Abbeel
%A Michael Laskin
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-stooke21a
%I PMLR
%P 9870--9879
%U https://proceedings.mlr.press/v139/stooke21a.html
%V 139
%X In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at \url{https://github.com/astooke/rlpyt/tree/master/rlpyt/ul}.

APA

Stooke, A., Lee, K., Abbeel, P. & Laskin, M.. (2021). Decoupling Representation Learning from Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9870-9879 Available from https://proceedings.mlr.press/v139/stooke21a.html.

Decoupling Representation Learning from Reinforcement Learning

Abstract

Cite this Paper

Related Material