Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

Tyler Ga Wei Lum; Olivia Y. Lee; Karen Liu; Jeannette Bohg

Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

Tyler Ga Wei Lum, Olivia Y. Lee, Karen Liu, Jeannette Bohg

Proceedings of The 9th Conference on Robot Learning, PMLR 305:4418-4441, 2025.

Abstract

Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-lum25a,
  title = 	 {Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration},
  author =       {Lum, Tyler Ga Wei and Lee, Olivia Y. and Liu, Karen and Bohg, Jeannette},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {4418--4441},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/lum25a/lum25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/lum25a.html},
  abstract = 	 {Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io}
}

Endnote

%0 Conference Paper
%T Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
%A Tyler Ga Wei Lum
%A Olivia Y. Lee
%A Karen Liu
%A Jeannette Bohg
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-lum25a
%I PMLR
%P 4418--4441
%U https://proceedings.mlr.press/v305/lum25a.html
%V 305
%X Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io

APA

Lum, T.G.W., Lee, O.Y., Liu, K. & Bohg, J.. (2025). Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4418-4441 Available from https://proceedings.mlr.press/v305/lum25a.html.

Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

Abstract

Cite this Paper

Related Material