Simplified Temporal Consistency Reinforcement Learning

Yi Zhao; Wenshuai Zhao; Rinu Boney; Juho Kannala; Joni Pajarinen

Simplified Temporal Consistency Reinforcement Learning

Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42227-42246, 2023.

Abstract

Reinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1

$\times$ faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4

$\times$ faster.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-zhao23k,
  title = 	 {Simplified Temporal Consistency Reinforcement Learning},
  author =       {Zhao, Yi and Zhao, Wenshuai and Boney, Rinu and Kannala, Juho and Pajarinen, Joni},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {42227--42246},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/zhao23k/zhao23k.pdf},
  url = 	 {https://proceedings.mlr.press/v202/zhao23k.html},
  abstract = 	 {Reinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1$\times$ faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4$\times$ faster.}
}

Endnote

%0 Conference Paper
%T Simplified Temporal Consistency Reinforcement Learning
%A Yi Zhao
%A Wenshuai Zhao
%A Rinu Boney
%A Juho Kannala
%A Joni Pajarinen
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-zhao23k
%I PMLR
%P 42227--42246
%U https://proceedings.mlr.press/v202/zhao23k.html
%V 202
%X Reinforcement learning (RL) is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1$\times$ faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the Deepmind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4$\times$ faster.

APA


Zhao, Y., Zhao, W., Boney, R., Kannala, J. & Pajarinen, J.. (2023). Simplified Temporal Consistency Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:42227-42246 Available from https://proceedings.mlr.press/v202/zhao23k.html.

Simplified Temporal Consistency Reinforcement Learning

Abstract

Cite this Paper

Related Material