Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

Tyler Ga Wei Lum, Olivia Y. Lee, Karen Liu, Jeannette Bohg
Proceedings of The 9th Conference on Robot Learning, PMLR 305:4418-4441, 2025.

Abstract

Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-lum25a, title = {Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration}, author = {Lum, Tyler Ga Wei and Lee, Olivia Y. and Liu, Karen and Bohg, Jeannette}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {4418--4441}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/lum25a/lum25a.pdf}, url = {https://proceedings.mlr.press/v305/lum25a.html}, abstract = {Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io} }
Endnote
%0 Conference Paper %T Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration %A Tyler Ga Wei Lum %A Olivia Y. Lee %A Karen Liu %A Jeannette Bohg %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-lum25a %I PMLR %P 4418--4441 %U https://proceedings.mlr.press/v305/lum25a.html %V 305 %X Teaching robots dexterous manipulation skills often requires collecting hundreds of demonstrations using wearables or teleoperation, a process that is challenging to scale. Videos of human-object interactions are easier to collect and scale, but leveraging them directly for robot learning is difficult due to the lack of explicit action labels and human-robot embodiment differences. We propose Human2Sim2Robot, a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task. Our method utilizes reinforcement learning (RL) in simulation to cross the embodiment gap without relying on wearables, teleoperation, or large-scale data collection. From the video, we extract: (1) the object pose trajectory to define an object-centric, embodiment-agnostic reward, and (2) the pre-manipulation hand pose to initialize and guide exploration during RL training. These components enable effective policy learning without any task-specific reward tuning. In the single human demo regime, Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks. Website: https://human2sim2robot.github.io
APA
Lum, T.G.W., Lee, O.Y., Liu, K. & Bohg, J.. (2025). Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4418-4441 Available from https://proceedings.mlr.press/v305/lum25a.html.

Related Material