Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13927-13942, 2024.

Abstract

Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-frans24a, title = {Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings}, author = {Frans, Kevin and Park, Seohong and Abbeel, Pieter and Levine, Sergey}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {13927--13942}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/frans24a/frans24a.pdf}, url = {https://proceedings.mlr.press/v235/frans24a.html}, abstract = {Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.} }
Endnote
%0 Conference Paper %T Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings %A Kevin Frans %A Seohong Park %A Pieter Abbeel %A Sergey Levine %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-frans24a %I PMLR %P 13927--13942 %U https://proceedings.mlr.press/v235/frans24a.html %V 235 %X Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.
APA
Frans, K., Park, S., Abbeel, P. & Levine, S.. (2024). Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13927-13942 Available from https://proceedings.mlr.press/v235/frans24a.html.

Related Material