Constrained Contrastive Reinforcement Learning

Haoyu Wang, Xinrui Yang, Yuhang Wang, Lan Xuguang
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:1070-1084, 2023.

Abstract

Learning to control from complex observations remains a major challenge in the application of model-based reinforcement learning (MBRL). Existing MBRL methods apply contrastive learning to replace pixel-level reconstruction, improving the performance of the latent world model. However, previous contrastive learning approaches in MBRL fail to utilize task-relevant information, making it difficult to aggregate observations with the same task-relevant information but the different task-irrelevant information in latent space. In this work, we first propose Constrained Contrastive Reinforcement Learning (C2RL), an MBRL method that learns a world model through a combination of two contrastive losses based on latent dynamics and task-relevant state abstraction respectively, utilizing reward information to accelerate model learning. Then, we propose a hyperparameter $\beta$ to balance two kinds of contrastive losses to strengthen the representation ability of the latent dynamics. The experimental results show that our approach outperforms state-of-the-art methods in both the natural video and standard background setting on challenging DMControl tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-wang23a, title = {Constrained Contrastive Reinforcement Learning}, author = {Wang, Haoyu and Yang, Xinrui and Wang, Yuhang and Xuguang, Lan}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {1070--1084}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/wang23a/wang23a.pdf}, url = {https://proceedings.mlr.press/v189/wang23a.html}, abstract = {Learning to control from complex observations remains a major challenge in the application of model-based reinforcement learning (MBRL). Existing MBRL methods apply contrastive learning to replace pixel-level reconstruction, improving the performance of the latent world model. However, previous contrastive learning approaches in MBRL fail to utilize task-relevant information, making it difficult to aggregate observations with the same task-relevant information but the different task-irrelevant information in latent space. In this work, we first propose Constrained Contrastive Reinforcement Learning (C2RL), an MBRL method that learns a world model through a combination of two contrastive losses based on latent dynamics and task-relevant state abstraction respectively, utilizing reward information to accelerate model learning. Then, we propose a hyperparameter $\beta$ to balance two kinds of contrastive losses to strengthen the representation ability of the latent dynamics. The experimental results show that our approach outperforms state-of-the-art methods in both the natural video and standard background setting on challenging DMControl tasks.} }
Endnote
%0 Conference Paper %T Constrained Contrastive Reinforcement Learning %A Haoyu Wang %A Xinrui Yang %A Yuhang Wang %A Lan Xuguang %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-wang23a %I PMLR %P 1070--1084 %U https://proceedings.mlr.press/v189/wang23a.html %V 189 %X Learning to control from complex observations remains a major challenge in the application of model-based reinforcement learning (MBRL). Existing MBRL methods apply contrastive learning to replace pixel-level reconstruction, improving the performance of the latent world model. However, previous contrastive learning approaches in MBRL fail to utilize task-relevant information, making it difficult to aggregate observations with the same task-relevant information but the different task-irrelevant information in latent space. In this work, we first propose Constrained Contrastive Reinforcement Learning (C2RL), an MBRL method that learns a world model through a combination of two contrastive losses based on latent dynamics and task-relevant state abstraction respectively, utilizing reward information to accelerate model learning. Then, we propose a hyperparameter $\beta$ to balance two kinds of contrastive losses to strengthen the representation ability of the latent dynamics. The experimental results show that our approach outperforms state-of-the-art methods in both the natural video and standard background setting on challenging DMControl tasks.
APA
Wang, H., Yang, X., Wang, Y. & Xuguang, L.. (2023). Constrained Contrastive Reinforcement Learning. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:1070-1084 Available from https://proceedings.mlr.press/v189/wang23a.html.

Related Material