Adaptive Language-Guided Abstraction from Contrastive Explanations

Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie Shah, Jacob Andreas, Andreea Bobu
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3425-3438, 2025.

Abstract

Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. In particularly complex, high-dimensional environments, human demonstrators often struggle to fully specify their desired behavior from a small number of demonstrations. End-to-end reward learning methods (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAElearns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input – making it possible to quickly and efficiently acquire rich representations of user behavior.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-peng25c, title = {Adaptive Language-Guided Abstraction from Contrastive Explanations}, author = {Peng, Andi and Li, Belinda Z. and Sucholutsky, Ilia and Kumar, Nishanth and Shah, Julie and Andreas, Jacob and Bobu, Andreea}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3425--3438}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/peng25c/peng25c.pdf}, url = {https://proceedings.mlr.press/v270/peng25c.html}, abstract = {Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. In particularly complex, high-dimensional environments, human demonstrators often struggle to fully specify their desired behavior from a small number of demonstrations. End-to-end reward learning methods (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAElearns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input – making it possible to quickly and efficiently acquire rich representations of user behavior.} }
Endnote
%0 Conference Paper %T Adaptive Language-Guided Abstraction from Contrastive Explanations %A Andi Peng %A Belinda Z. Li %A Ilia Sucholutsky %A Nishanth Kumar %A Julie Shah %A Jacob Andreas %A Andreea Bobu %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-peng25c %I PMLR %P 3425--3438 %U https://proceedings.mlr.press/v270/peng25c.html %V 270 %X Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. In particularly complex, high-dimensional environments, human demonstrators often struggle to fully specify their desired behavior from a small number of demonstrations. End-to-end reward learning methods (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAElearns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input – making it possible to quickly and efficiently acquire rich representations of user behavior.
APA
Peng, A., Li, B.Z., Sucholutsky, I., Kumar, N., Shah, J., Andreas, J. & Bobu, A.. (2025). Adaptive Language-Guided Abstraction from Contrastive Explanations. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3425-3438 Available from https://proceedings.mlr.press/v270/peng25c.html.

Related Material