CoRI: Communication of Robot Intent for Physical Human-Robot Interaction

Junxiang Wang, Emek Barış Küçüktabak, Rana Soltani Zarrin, Zackory Erickson
Proceedings of The 9th Conference on Robot Learning, PMLR 305:2360-2390, 2025.

Abstract

Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot’s upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot’s image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot’s high-level intentions but also crucial details about its motion and any collaborative user action needed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-wang25e, title = {CoRI: Communication of Robot Intent for Physical Human-Robot Interaction}, author = {Wang, Junxiang and K\"{u}\c{c}\"{u}ktabak, Emek Bar{\i}\c{s} and Zarrin, Rana Soltani and Erickson, Zackory}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {2360--2390}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/wang25e/wang25e.pdf}, url = {https://proceedings.mlr.press/v305/wang25e.html}, abstract = {Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot’s upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot’s image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot’s high-level intentions but also crucial details about its motion and any collaborative user action needed.} }
Endnote
%0 Conference Paper %T CoRI: Communication of Robot Intent for Physical Human-Robot Interaction %A Junxiang Wang %A Emek Barış Küçüktabak %A Rana Soltani Zarrin %A Zackory Erickson %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-wang25e %I PMLR %P 2360--2390 %U https://proceedings.mlr.press/v305/wang25e.html %V 305 %X Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot’s upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot’s image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot’s high-level intentions but also crucial details about its motion and any collaborative user action needed.
APA
Wang, J., Küçüktabak, E.B., Zarrin, R.S. & Erickson, Z.. (2025). CoRI: Communication of Robot Intent for Physical Human-Robot Interaction. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:2360-2390 Available from https://proceedings.mlr.press/v305/wang25e.html.

Related Material