ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations

Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, Jesse Zhang
Proceedings of The 9th Conference on Robot Learning, PMLR 305:460-488, 2025.

Abstract

We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND’s reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4X in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks in both simulation and on a real bimanual manipulation platform, taking a step towards scalable, real-world robot learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-zhang25a, title = {ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations}, author = {Zhang, Jiahui and Luo, Yusen and Anwar, Abrar and Sontakke, Sumedh Anand and Lim, Joseph J and Thomason, Jesse and Biyik, Erdem and Zhang, Jesse}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {460--488}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/zhang25a/zhang25a.pdf}, url = {https://proceedings.mlr.press/v305/zhang25a.html}, abstract = {We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND’s reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4X in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks in both simulation and on a real bimanual manipulation platform, taking a step towards scalable, real-world robot learning.} }
Endnote
%0 Conference Paper %T ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations %A Jiahui Zhang %A Yusen Luo %A Abrar Anwar %A Sumedh Anand Sontakke %A Joseph J Lim %A Jesse Thomason %A Erdem Biyik %A Jesse Zhang %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-zhang25a %I PMLR %P 460--488 %U https://proceedings.mlr.press/v305/zhang25a.html %V 305 %X We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND’s reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4X in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks in both simulation and on a real bimanual manipulation platform, taking a step towards scalable, real-world robot learning.
APA
Zhang, J., Luo, Y., Anwar, A., Sontakke, S.A., Lim, J.J., Thomason, J., Biyik, E. & Zhang, J.. (2025). ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:460-488 Available from https://proceedings.mlr.press/v305/zhang25a.html.

Related Material