I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences

Zihan Wang, Brian Liang, Varad Dhat, Zander Brumbaugh, Nick Walker, Ranjay Krishna, Maya Cakmak
Proceedings of The 8th Conference on Robot Learning, PMLR 270:1863-1890, 2025.

Abstract

Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR’s effectiveness in enhancing user experience in system transparency and failure analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-wang25g, title = {I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences}, author = {Wang, Zihan and Liang, Brian and Dhat, Varad and Brumbaugh, Zander and Walker, Nick and Krishna, Ranjay and Cakmak, Maya}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {1863--1890}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/wang25g/wang25g.pdf}, url = {https://proceedings.mlr.press/v270/wang25g.html}, abstract = {Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR’s effectiveness in enhancing user experience in system transparency and failure analysis.} }
Endnote
%0 Conference Paper %T I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences %A Zihan Wang %A Brian Liang %A Varad Dhat %A Zander Brumbaugh %A Nick Walker %A Ranjay Krishna %A Maya Cakmak %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-wang25g %I PMLR %P 1863--1890 %U https://proceedings.mlr.press/v270/wang25g.html %V 270 %X Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR’s effectiveness in enhancing user experience in system transparency and failure analysis.
APA
Wang, Z., Liang, B., Dhat, V., Brumbaugh, Z., Walker, N., Krishna, R. & Cakmak, M.. (2025). I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1863-1890 Available from https://proceedings.mlr.press/v270/wang25g.html.

Related Material