TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization

Yuxuan Ding, Shuangge Wang, Tesca Fitzgerald
Proceedings of The 9th Conference on Robot Learning, PMLR 305:5129-5150, 2025.

Abstract

Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task’s spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-ding25a, title = {TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization}, author = {Ding, Yuxuan and Wang, Shuangge and Fitzgerald, Tesca}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {5129--5150}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/ding25a/ding25a.pdf}, url = {https://proceedings.mlr.press/v305/ding25a.html}, abstract = {Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task’s spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.} }
Endnote
%0 Conference Paper %T TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization %A Yuxuan Ding %A Shuangge Wang %A Tesca Fitzgerald %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-ding25a %I PMLR %P 5129--5150 %U https://proceedings.mlr.press/v305/ding25a.html %V 305 %X Robots often struggle to generalize from a single demonstration due to the lack of a transferable and interpretable spatial representation. In this work, we introduce TReF-6, a method that infers a simplified, abstracted 6DoF Task-Relevant Frame from a single trajectory. Our approach identifies an influence point purely from the trajectory geometry to define the origin for a local frame, which serves as a reference for parameterizing a Dynamic Movement Primitive (DMP). This influence point captures the task’s spatial structure, extending the standard DMP formulation beyond start-goal imitation. The inferred frame is semantically grounded via a vision-language model and localized in novel scenes by Grounded-SAM, enabling functionally consistent skill generalization. We validate TReF-6 in simulation and demonstrate robustness to trajectory noise. We further deploy an end-to-end pipeline on real-world manipulation tasks, showing that TReF-6 supports one-shot imitation learning that preserves task intent across diverse object configurations.
APA
Ding, Y., Wang, S. & Fitzgerald, T.. (2025). TReF-6: Inferring Task-Relevant Frames from a Single Demonstration for One-Shot Skill Generalization. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:5129-5150 Available from https://proceedings.mlr.press/v305/ding25a.html.

Related Material