DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control

Guiliang Liu, Yueci Deng, Runyi Zhao, Huayi Zhou, Jian Chen, Jietao Chen, Ruiyan Xu, Yunxin Tai, Kui Jia
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:38313-38331, 2025.

Abstract

A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in virtual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap between simulated and realistic environments. To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks. For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in advancing Sim2Real embodied intelligence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25k, title = {{D}ex{S}cale: Automating Data Scaling for {S}im2{R}eal Generalizable Robot Control}, author = {Liu, Guiliang and Deng, Yueci and Zhao, Runyi and Zhou, Huayi and Chen, Jian and Chen, Jietao and Xu, Ruiyan and Tai, Yunxin and Jia, Kui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {38313--38331}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25k/liu25k.pdf}, url = {https://proceedings.mlr.press/v267/liu25k.html}, abstract = {A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in virtual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap between simulated and realistic environments. To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks. For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in advancing Sim2Real embodied intelligence.} }
Endnote
%0 Conference Paper %T DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control %A Guiliang Liu %A Yueci Deng %A Runyi Zhao %A Huayi Zhou %A Jian Chen %A Jietao Chen %A Ruiyan Xu %A Yunxin Tai %A Kui Jia %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25k %I PMLR %P 38313--38331 %U https://proceedings.mlr.press/v267/liu25k.html %V 267 %X A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in virtual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap between simulated and realistic environments. To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks. For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in advancing Sim2Real embodied intelligence.
APA
Liu, G., Deng, Y., Zhao, R., Zhou, H., Chen, J., Chen, J., Xu, R., Tai, Y. & Jia, K.. (2025). DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:38313-38331 Available from https://proceedings.mlr.press/v267/liu25k.html.

Related Material