RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal
Proceedings of The 8th Conference on Robot Learning, PMLR 270:70-96, 2025.

Abstract

Natural language and images are commonly used as goal representations in goal-conditioned imitation learning. However, language can be ambiguous and images can be over-specified. In this work, we study hand-drawn sketches as a modality for goal specification. Sketches can be easy to provide on the fly like language, but like images they can also help a downstream policy to be spatially-aware. By virtue of being minimal, sketches can further help disambiguate task-relevant from irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of trajectories paired with synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an articulated countertop. Experimentally we find that RT-Sketch performs comparably to image or language-conditioned agents in straightforward settings, while achieving greater robustness when language goals are ambiguous or visual distractors are present. Additionally, we show that RT-Sketch handles sketches with varied levels of specificity, ranging from minimal line drawings to detailed, colored drawings. For supplementary material and videos, please visit http://rt-sketch.github.io.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-sundaresan25a, title = {RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches}, author = {Sundaresan, Priya and Vuong, Quan and Gu, Jiayuan and Xu, Peng and Xiao, Ted and Kirmani, Sean and Yu, Tianhe and Stark, Michael and Jain, Ajinkya and Hausman, Karol and Sadigh, Dorsa and Bohg, Jeannette and Schaal, Stefan}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {70--96}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/sundaresan25a/sundaresan25a.pdf}, url = {https://proceedings.mlr.press/v270/sundaresan25a.html}, abstract = {Natural language and images are commonly used as goal representations in goal-conditioned imitation learning. However, language can be ambiguous and images can be over-specified. In this work, we study hand-drawn sketches as a modality for goal specification. Sketches can be easy to provide on the fly like language, but like images they can also help a downstream policy to be spatially-aware. By virtue of being minimal, sketches can further help disambiguate task-relevant from irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of trajectories paired with synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an articulated countertop. Experimentally we find that RT-Sketch performs comparably to image or language-conditioned agents in straightforward settings, while achieving greater robustness when language goals are ambiguous or visual distractors are present. Additionally, we show that RT-Sketch handles sketches with varied levels of specificity, ranging from minimal line drawings to detailed, colored drawings. For supplementary material and videos, please visit http://rt-sketch.github.io.} }
Endnote
%0 Conference Paper %T RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches %A Priya Sundaresan %A Quan Vuong %A Jiayuan Gu %A Peng Xu %A Ted Xiao %A Sean Kirmani %A Tianhe Yu %A Michael Stark %A Ajinkya Jain %A Karol Hausman %A Dorsa Sadigh %A Jeannette Bohg %A Stefan Schaal %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-sundaresan25a %I PMLR %P 70--96 %U https://proceedings.mlr.press/v270/sundaresan25a.html %V 270 %X Natural language and images are commonly used as goal representations in goal-conditioned imitation learning. However, language can be ambiguous and images can be over-specified. In this work, we study hand-drawn sketches as a modality for goal specification. Sketches can be easy to provide on the fly like language, but like images they can also help a downstream policy to be spatially-aware. By virtue of being minimal, sketches can further help disambiguate task-relevant from irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of trajectories paired with synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an articulated countertop. Experimentally we find that RT-Sketch performs comparably to image or language-conditioned agents in straightforward settings, while achieving greater robustness when language goals are ambiguous or visual distractors are present. Additionally, we show that RT-Sketch handles sketches with varied levels of specificity, ranging from minimal line drawings to detailed, colored drawings. For supplementary material and videos, please visit http://rt-sketch.github.io.
APA
Sundaresan, P., Vuong, Q., Gu, J., Xu, P., Xiao, T., Kirmani, S., Yu, T., Stark, M., Jain, A., Hausman, K., Sadigh, D., Bohg, J. & Schaal, S.. (2025). RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:70-96 Available from https://proceedings.mlr.press/v270/sundaresan25a.html.

Related Material