SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration

Maria Stamatopoulou; Jeffrey Li; Dimitrios Kanoulas

SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration

Maria Stamatopoulou, Jeffrey Li, Dimitrios Kanoulas

Proceedings of The 9th Conference on Robot Learning, PMLR 305:1879-1897, 2025.

Abstract

Imagine a robot learning locomotion skills from any single video, without labels or reward engineering. We introduce SDS ("See it. Do it. Sorted."), an automated pipeline for skill acquisition from unstructured video demonstrations. Using GPT-4o, SDS applies novel prompting techniques, in the form of spatio-temporal grid-based visual encoding (Gv) and structured input decomposition (SUS). These produce executable reward functions (RF) from raw input videos. The RFs are used to train PPO policies and are optimized through closed-loop evolution, using training footage and performance metrics as self-supervised signals. SDS allows quadrupeds (e.g., Unitree Go1) to learn four gaits—trot, bound, pace, and hop—achieving 100% gait matching fidelity, Dynamic Time Warping (DTW) distance in the order of 10^-6, and stable locomotion with zero failures, both in simulation and the real world. SDS generalizes to morphologically different quadrupeds (e.g., ANYmal) and outperforms prior work in data efficiency, training time, and engineering effort. Our code is open-source under: https://sdsreview.github.io/SDS_ANONYM/

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-stamatopoulou25a,
  title = 	 {SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration},
  author =       {Stamatopoulou, Maria and Li, Jeffrey and Kanoulas, Dimitrios},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {1879--1897},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/stamatopoulou25a/stamatopoulou25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/stamatopoulou25a.html},
  abstract = 	 {Imagine a robot learning locomotion skills from any single video, without labels or reward engineering. We introduce SDS ("See it. Do it. Sorted."), an automated pipeline for skill acquisition from unstructured video demonstrations. Using GPT-4o, SDS applies novel prompting techniques, in the form of spatio-temporal grid-based visual encoding (Gv) and structured input decomposition (SUS). These produce executable reward functions (RF) from raw input videos. The RFs are used to train PPO policies and are optimized through closed-loop evolution, using training footage and performance metrics as self-supervised signals. SDS allows quadrupeds (e.g., Unitree Go1) to learn four gaits—trot, bound, pace, and hop—achieving 100% gait matching fidelity, Dynamic Time Warping (DTW) distance in the order of 10^-6, and stable locomotion with zero failures, both in simulation and the real world. SDS generalizes to morphologically different quadrupeds (e.g., ANYmal) and outperforms prior work in data efficiency, training time, and engineering effort. Our code is open-source under: https://sdsreview.github.io/SDS_ANONYM/}
}

Endnote

%0 Conference Paper
%T SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration
%A Maria Stamatopoulou
%A Jeffrey Li
%A Dimitrios Kanoulas
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-stamatopoulou25a
%I PMLR
%P 1879--1897
%U https://proceedings.mlr.press/v305/stamatopoulou25a.html
%V 305
%X Imagine a robot learning locomotion skills from any single video, without labels or reward engineering. We introduce SDS ("See it. Do it. Sorted."), an automated pipeline for skill acquisition from unstructured video demonstrations. Using GPT-4o, SDS applies novel prompting techniques, in the form of spatio-temporal grid-based visual encoding (Gv) and structured input decomposition (SUS). These produce executable reward functions (RF) from raw input videos. The RFs are used to train PPO policies and are optimized through closed-loop evolution, using training footage and performance metrics as self-supervised signals. SDS allows quadrupeds (e.g., Unitree Go1) to learn four gaits—trot, bound, pace, and hop—achieving 100% gait matching fidelity, Dynamic Time Warping (DTW) distance in the order of 10^-6, and stable locomotion with zero failures, both in simulation and the real world. SDS generalizes to morphologically different quadrupeds (e.g., ANYmal) and outperforms prior work in data efficiency, training time, and engineering effort. Our code is open-source under: https://sdsreview.github.io/SDS_ANONYM/

APA

Stamatopoulou, M., Li, J. & Kanoulas, D.. (2025). SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:1879-1897 Available from https://proceedings.mlr.press/v305/stamatopoulou25a.html.

SDS – See it, Do it, Sorted: Quadruped Skill Synthesis from Single Video Demonstration

Abstract

Cite this Paper

Related Material