Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Norman Di Palo, Leonard Hasenclever, Jan Humplik, Arunkumar Byravan
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:268-284, 2025.

Abstract

We address the problem of sample-efficiency when training instruction-following embodied agents using reinforcement learning in a lifelong setting, where rewards may be sparse or absent. Our framework, which we call Diffusion Augmented Agent (DAAG), leverages a large language model (LLM), a vision language model (VLM), and a pipeline for using image diffusion models for temporally and geometrically consistent conditional video generation to hindsight relabel agent’s past experience. Given a video-instruction pair and a target instruction, we ask the LLM if our diffusion model could transform the video into one which is consistent with the target instruction, and, if so, we apply this transformation. We use such hindsight data augmentation to decrease the amount of data needed for 1) fine-tuning a VLM which acts as a reward detector as well as 2) the amount of reward-labelled data for RL training. The LLM orchestrates this process, making the entire framework autonomous and independent from human supervision, hence particularly suited for lifelong reinforcement learning scenarios. We empirically demonstrate gains in sample-efficiency when training in simulated robotics environments, including manipulation and navigation tasks, showing improvements in learning reward detectors, transferring past experience, and learning new tasks, key abilities for efficient, lifelong learning agents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v274-palo25a, title = {Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning}, author = {Palo, Norman Di and Hasenclever, Leonard and Humplik, Jan and Byravan, Arunkumar}, booktitle = {Proceedings of The 3rd Conference on Lifelong Learning Agents}, pages = {268--284}, year = {2025}, editor = {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan}, volume = {274}, series = {Proceedings of Machine Learning Research}, month = {29 Jul--01 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v274/main/assets/palo25a/palo25a.pdf}, url = {https://proceedings.mlr.press/v274/palo25a.html}, abstract = {We address the problem of sample-efficiency when training instruction-following embodied agents using reinforcement learning in a lifelong setting, where rewards may be sparse or absent. Our framework, which we call Diffusion Augmented Agent (DAAG), leverages a large language model (LLM), a vision language model (VLM), and a pipeline for using image diffusion models for temporally and geometrically consistent conditional video generation to hindsight relabel agent’s past experience. Given a video-instruction pair and a target instruction, we ask the LLM if our diffusion model could transform the video into one which is consistent with the target instruction, and, if so, we apply this transformation. We use such hindsight data augmentation to decrease the amount of data needed for 1) fine-tuning a VLM which acts as a reward detector as well as 2) the amount of reward-labelled data for RL training. The LLM orchestrates this process, making the entire framework autonomous and independent from human supervision, hence particularly suited for lifelong reinforcement learning scenarios. We empirically demonstrate gains in sample-efficiency when training in simulated robotics environments, including manipulation and navigation tasks, showing improvements in learning reward detectors, transferring past experience, and learning new tasks, key abilities for efficient, lifelong learning agents.} }
Endnote
%0 Conference Paper %T Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning %A Norman Di Palo %A Leonard Hasenclever %A Jan Humplik %A Arunkumar Byravan %B Proceedings of The 3rd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2025 %E Vincenzo Lomonaco %E Stefano Melacci %E Tinne Tuytelaars %E Sarath Chandar %E Razvan Pascanu %F pmlr-v274-palo25a %I PMLR %P 268--284 %U https://proceedings.mlr.press/v274/palo25a.html %V 274 %X We address the problem of sample-efficiency when training instruction-following embodied agents using reinforcement learning in a lifelong setting, where rewards may be sparse or absent. Our framework, which we call Diffusion Augmented Agent (DAAG), leverages a large language model (LLM), a vision language model (VLM), and a pipeline for using image diffusion models for temporally and geometrically consistent conditional video generation to hindsight relabel agent’s past experience. Given a video-instruction pair and a target instruction, we ask the LLM if our diffusion model could transform the video into one which is consistent with the target instruction, and, if so, we apply this transformation. We use such hindsight data augmentation to decrease the amount of data needed for 1) fine-tuning a VLM which acts as a reward detector as well as 2) the amount of reward-labelled data for RL training. The LLM orchestrates this process, making the entire framework autonomous and independent from human supervision, hence particularly suited for lifelong reinforcement learning scenarios. We empirically demonstrate gains in sample-efficiency when training in simulated robotics environments, including manipulation and navigation tasks, showing improvements in learning reward detectors, transferring past experience, and learning new tasks, key abilities for efficient, lifelong learning agents.
APA
Palo, N.D., Hasenclever, L., Humplik, J. & Byravan, A.. (2025). Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:268-284 Available from https://proceedings.mlr.press/v274/palo25a.html.

Related Material