Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware

Justin Yu, Letian Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg
Proceedings of The 9th Conference on Robot Learning, PMLR 305:547-577, 2025.

Abstract

Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm—human teleoperation—remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision-modeling turned off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-yu25a, title = {Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware}, author = {Yu, Justin and Fu, Letian and Huang, Huang and El-Refai, Karim and Ambrus, Rares Andrei and Cheng, Richard and Irshad, Muhammad Zubair and Goldberg, Ken}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {547--577}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/yu25a/yu25a.pdf}, url = {https://proceedings.mlr.press/v305/yu25a.html}, abstract = {Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm—human teleoperation—remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision-modeling turned off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations.} }
Endnote
%0 Conference Paper %T Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware %A Justin Yu %A Letian Fu %A Huang Huang %A Karim El-Refai %A Rares Andrei Ambrus %A Richard Cheng %A Muhammad Zubair Irshad %A Ken Goldberg %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-yu25a %I PMLR %P 547--577 %U https://proceedings.mlr.press/v305/yu25a.html %V 305 %X Scaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm—human teleoperation—remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision-modeling turned off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations.
APA
Yu, J., Fu, L., Huang, H., El-Refai, K., Ambrus, R.A., Cheng, R., Irshad, M.Z. & Goldberg, K.. (2025). Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:547-577 Available from https://proceedings.mlr.press/v305/yu25a.html.

Related Material