Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration

Michael Murray; Abhishek Gupta; Maya Cakmak

Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration

Michael Murray, Abhishek Gupta, Maya Cakmak

Proceedings of The 8th Conference on Robot Learning, PMLR 270:4033-4050, 2025.

Abstract

We introduce a modular, neuro-symbolic framework for teaching robots new skills through language and visual demonstration. Our approach, ShowTell, composes a mixture of foundation models to synthesize robot manipulation programs that are easy to interpret and generalize across a wide range of tasks and environments. ShowTell is designed to handle complex demonstrations involving high level logic such as loops and conditionals while being intuitive and natural for end-users. We validate this approach through a series of real-world robot experiments, showing that ShowTell out-performs a state-of-the-art baseline based on GPT4-V, on a variety of tasks, and that it is able to generalize to unseen environments and within category objects.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-murray25a,
  title = 	 {Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration},
  author =       {Murray, Michael and Gupta, Abhishek and Cakmak, Maya},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {4033--4050},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/murray25a/murray25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/murray25a.html},
  abstract = 	 {We introduce a modular, neuro-symbolic framework for teaching robots new skills through language and visual demonstration. Our approach, ShowTell, composes a mixture of foundation models to synthesize robot manipulation programs that are easy to interpret and generalize across a wide range of tasks and environments. ShowTell is designed to handle complex demonstrations involving high level logic such as loops and conditionals while being intuitive and natural for end-users. We validate this approach through a series of real-world robot experiments, showing that ShowTell out-performs a state-of-the-art baseline based on GPT4-V, on a variety of tasks, and that it is able to generalize to unseen environments and within category objects.}
}

Endnote

%0 Conference Paper
%T Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration
%A Michael Murray
%A Abhishek Gupta
%A Maya Cakmak
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-murray25a
%I PMLR
%P 4033--4050
%U https://proceedings.mlr.press/v270/murray25a.html
%V 270
%X We introduce a modular, neuro-symbolic framework for teaching robots new skills through language and visual demonstration. Our approach, ShowTell, composes a mixture of foundation models to synthesize robot manipulation programs that are easy to interpret and generalize across a wide range of tasks and environments. ShowTell is designed to handle complex demonstrations involving high level logic such as loops and conditionals while being intuitive and natural for end-users. We validate this approach through a series of real-world robot experiments, showing that ShowTell out-performs a state-of-the-art baseline based on GPT4-V, on a variety of tasks, and that it is able to generalize to unseen environments and within category objects.

APA

Murray, M., Gupta, A. & Cakmak, M.. (2025). Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:4033-4050 Available from https://proceedings.mlr.press/v270/murray25a.html.

Teaching Robots with Show and Tell: Using Foundation Models to Synthesize Robot Policies from Language and Visual Demonstration

Abstract

Cite this Paper

Related Material