Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Kevin Frans; Seohong Park; Pieter Abbeel; Sergey Levine

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:13927-13942, 2024.

Abstract

Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-frans24a,
  title = 	 {Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings},
  author =       {Frans, Kevin and Park, Seohong and Abbeel, Pieter and Levine, Sergey},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {13927--13942},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/frans24a/frans24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/frans24a.html},
  abstract = 	 {Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.}
}

Endnote

%0 Conference Paper
%T Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
%A Kevin Frans
%A Seohong Park
%A Pieter Abbeel
%A Sergey Levine
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-frans24a
%I PMLR
%P 13927--13942
%U https://proceedings.mlr.press/v235/frans24a.html
%V 235
%X Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods.

APA


Frans, K., Park, S., Abbeel, P. & Levine, S.. (2024). Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:13927-13942 Available from https://proceedings.mlr.press/v235/frans24a.html.

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

Abstract

Cite this Paper

Related Material