Phantom: Training Robots Without Robots Using Only Human Videos

Marion Lepert; Jiaying Fang; Jeannette Bohg

Phantom: Training Robots Without Robots Using Only Human Videos

Marion Lepert, Jiaying Fang, Jeannette Bohg

Proceedings of The 9th Conference on Robot Learning, PMLR 305:4545-4565, 2025.

Abstract

Training general-purpose robots requires learning from large and diverse data sources. Current approaches rely heavily on teleoperated demonstrations which are difficult to scale. We present a scalable framework for training manipulation policies directly from human video demonstrations, requiring no robot data. Our method converts human demonstrations into robot-compatible observation-action pairs using hand pose estimation and visual data editing. We inpaint the human arm and overlay a rendered robot to align the visual domains. This enables zero-shot deployment on real hardware without any fine-tuning. We demonstrate strong success rates—up to 92%—on a range of tasks including deformable object manipulation, multi-object sweeping, and insertion. Our approach generalizes to novel environments and supports closed-loop execution. By demonstrating that effective policies can be trained using only human videos, our method broadens the path to scalable robot learning. Videos are available at https://phantom-training-robots.github.io.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-lepert25a,
  title = 	 {Phantom: Training Robots Without Robots Using Only Human Videos},
  author =       {Lepert, Marion and Fang, Jiaying and Bohg, Jeannette},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {4545--4565},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/lepert25a/lepert25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/lepert25a.html},
  abstract = 	 {Training general-purpose robots requires learning from large and diverse data sources. Current approaches rely heavily on teleoperated demonstrations which are difficult to scale. We present a scalable framework for training manipulation policies directly from human video demonstrations, requiring no robot data. Our method converts human demonstrations into robot-compatible observation-action pairs using hand pose estimation and visual data editing. We inpaint the human arm and overlay a rendered robot to align the visual domains. This enables zero-shot deployment on real hardware without any fine-tuning. We demonstrate strong success rates—up to 92%—on a range of tasks including deformable object manipulation, multi-object sweeping, and insertion. Our approach generalizes to novel environments and supports closed-loop execution. By demonstrating that effective policies can be trained using only human videos, our method broadens the path to scalable robot learning. Videos are available at https://phantom-training-robots.github.io.}
}

Endnote

%0 Conference Paper
%T Phantom: Training Robots Without Robots Using Only Human Videos
%A Marion Lepert
%A Jiaying Fang
%A Jeannette Bohg
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-lepert25a
%I PMLR
%P 4545--4565
%U https://proceedings.mlr.press/v305/lepert25a.html
%V 305
%X Training general-purpose robots requires learning from large and diverse data sources. Current approaches rely heavily on teleoperated demonstrations which are difficult to scale. We present a scalable framework for training manipulation policies directly from human video demonstrations, requiring no robot data. Our method converts human demonstrations into robot-compatible observation-action pairs using hand pose estimation and visual data editing. We inpaint the human arm and overlay a rendered robot to align the visual domains. This enables zero-shot deployment on real hardware without any fine-tuning. We demonstrate strong success rates—up to 92%—on a range of tasks including deformable object manipulation, multi-object sweeping, and insertion. Our approach generalizes to novel environments and supports closed-loop execution. By demonstrating that effective policies can be trained using only human videos, our method broadens the path to scalable robot learning. Videos are available at https://phantom-training-robots.github.io.

APA

Lepert, M., Fang, J. & Bohg, J.. (2025). Phantom: Training Robots Without Robots Using Only Human Videos. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4545-4565 Available from https://proceedings.mlr.press/v305/lepert25a.html.

Phantom: Training Robots Without Robots Using Only Human Videos

Abstract

Cite this Paper

Related Material