Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

Haoqi Yuan, Zongqing Lu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25747-25759, 2022.

Abstract

We study offline meta-reinforcement learning, a practical reinforcement learning paradigm that learns from offline data to adapt to new tasks. The distribution of offline data is determined jointly by the behavior policy and the task. Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. We design a bi-level encoder structure, use mutual information maximization to formalize task representation learning, derive a contrastive learning objective, and introduce several approaches to approximate the true distribution of negative pairs. Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-yuan22a, title = {Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning}, author = {Yuan, Haoqi and Lu, Zongqing}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {25747--25759}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/yuan22a/yuan22a.pdf}, url = {https://proceedings.mlr.press/v162/yuan22a.html}, abstract = {We study offline meta-reinforcement learning, a practical reinforcement learning paradigm that learns from offline data to adapt to new tasks. The distribution of offline data is determined jointly by the behavior policy and the task. Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. We design a bi-level encoder structure, use mutual information maximization to formalize task representation learning, derive a contrastive learning objective, and introduce several approaches to approximate the true distribution of negative pairs. Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.} }
Endnote
%0 Conference Paper %T Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning %A Haoqi Yuan %A Zongqing Lu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-yuan22a %I PMLR %P 25747--25759 %U https://proceedings.mlr.press/v162/yuan22a.html %V 162 %X We study offline meta-reinforcement learning, a practical reinforcement learning paradigm that learns from offline data to adapt to new tasks. The distribution of offline data is determined jointly by the behavior policy and the task. Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. We design a bi-level encoder structure, use mutual information maximization to formalize task representation learning, derive a contrastive learning objective, and introduce several approaches to approximate the true distribution of negative pairs. Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.
APA
Yuan, H. & Lu, Z.. (2022). Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25747-25759 Available from https://proceedings.mlr.press/v162/yuan22a.html.

Related Material