Trajectory Improvement and Reward Learning from Comparative Language Feedback

Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart Russell, Anca Dragan, Erdem Biyik
Proceedings of The 8th Conference on Robot Learning, PMLR 270:5389-5404, 2025.

Abstract

Learning from human feedback has gained traction in fields like robotics and natural language processing in recent years. While prior works mostly rely on human feedback in the form of comparisons, language is a preferable modality that provides more informative insights into user preferences. In this work, we aim to incorporate comparative language feedback to iteratively improve robot trajectories and to learn reward functions that encode human preferences. To achieve this goal, we learn a shared latent space that integrates trajectory data and language feedback, and subsequently leverage the learned latent space to improve trajectories and learn human preferences. To the best of our knowledge, we are the first to incorporate comparative language feedback into reward learning. Our simulation experiments demonstrate the effectiveness of the learned latent space and the success of our learning algorithms. We also conduct human subject studies that show our reward learning algorithm achieves a 23.9% higher subjective score on average and is 11.3% more time-efficient compared to preference-based reward learning, underscoring the superior performance of our method. Our website is at https://liralab.usc.edu/comparative-language-feedback/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-yang25e, title = {Trajectory Improvement and Reward Learning from Comparative Language Feedback}, author = {Yang, Zhaojing and Jun, Miru and Tien, Jeremy and Russell, Stuart and Dragan, Anca and Biyik, Erdem}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {5389--5404}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/yang25e/yang25e.pdf}, url = {https://proceedings.mlr.press/v270/yang25e.html}, abstract = {Learning from human feedback has gained traction in fields like robotics and natural language processing in recent years. While prior works mostly rely on human feedback in the form of comparisons, language is a preferable modality that provides more informative insights into user preferences. In this work, we aim to incorporate comparative language feedback to iteratively improve robot trajectories and to learn reward functions that encode human preferences. To achieve this goal, we learn a shared latent space that integrates trajectory data and language feedback, and subsequently leverage the learned latent space to improve trajectories and learn human preferences. To the best of our knowledge, we are the first to incorporate comparative language feedback into reward learning. Our simulation experiments demonstrate the effectiveness of the learned latent space and the success of our learning algorithms. We also conduct human subject studies that show our reward learning algorithm achieves a 23.9% higher subjective score on average and is 11.3% more time-efficient compared to preference-based reward learning, underscoring the superior performance of our method. Our website is at https://liralab.usc.edu/comparative-language-feedback/.} }
Endnote
%0 Conference Paper %T Trajectory Improvement and Reward Learning from Comparative Language Feedback %A Zhaojing Yang %A Miru Jun %A Jeremy Tien %A Stuart Russell %A Anca Dragan %A Erdem Biyik %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-yang25e %I PMLR %P 5389--5404 %U https://proceedings.mlr.press/v270/yang25e.html %V 270 %X Learning from human feedback has gained traction in fields like robotics and natural language processing in recent years. While prior works mostly rely on human feedback in the form of comparisons, language is a preferable modality that provides more informative insights into user preferences. In this work, we aim to incorporate comparative language feedback to iteratively improve robot trajectories and to learn reward functions that encode human preferences. To achieve this goal, we learn a shared latent space that integrates trajectory data and language feedback, and subsequently leverage the learned latent space to improve trajectories and learn human preferences. To the best of our knowledge, we are the first to incorporate comparative language feedback into reward learning. Our simulation experiments demonstrate the effectiveness of the learned latent space and the success of our learning algorithms. We also conduct human subject studies that show our reward learning algorithm achieves a 23.9% higher subjective score on average and is 11.3% more time-efficient compared to preference-based reward learning, underscoring the superior performance of our method. Our website is at https://liralab.usc.edu/comparative-language-feedback/.
APA
Yang, Z., Jun, M., Tien, J., Russell, S., Dragan, A. & Biyik, E.. (2025). Trajectory Improvement and Reward Learning from Comparative Language Feedback. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:5389-5404 Available from https://proceedings.mlr.press/v270/yang25e.html.

Related Material