Evaluating Real-World Robot Manipulation Policies in Simulation

Xuanlin Li, Kyle Hsu, Jiayuan Gu, Oier Mees, Karl Pertsch, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao
Proceedings of The 8th Conference on Robot Learning, PMLR 270:3705-3728, 2025.

Abstract

The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, issues that are likely to worsen as policies broaden the spectrum of tasks they can perform. In this work, we demonstrate that simulation-based evaluation can be a scalable, reproducible, and reliable proxy for real-world evaluation. We identify control and visual disparities between real and simulated environments as key challenges for reliable simulated evaluation and propose approaches for mitigating these gaps without needing to painstakingly craft full-fidelity digital twins. We then employ these techniques to create SIMPLER, a collection of simulated environments for policy evaluation on common real robot manipulation setups. Through over 1500 paired sim-and-real evaluations of manipulation policies across two embodiments and eight task families, we demonstrate strong correlation between policy performance in SIMPLER environments and that in the real world. Beyond aggregated trends, we find that SIMPLER evaluations effectively reflect the real-world behaviors of individual policies, such as sensitivity to various distribution shifts. We are committed to open-sourcing all SIMPLER environments along with our workflow for creating new environments to facilitate research on general-purpose manipulation policies and simulated evaluation frameworks. Website: https://simpler-env.github.io/

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-li25c, title = {Evaluating Real-World Robot Manipulation Policies in Simulation}, author = {Li, Xuanlin and Hsu, Kyle and Gu, Jiayuan and Mees, Oier and Pertsch, Karl and Walke, Homer Rich and Fu, Chuyuan and Lunawat, Ishikaa and Sieh, Isabel and Kirmani, Sean and Levine, Sergey and Wu, Jiajun and Finn, Chelsea and Su, Hao and Vuong, Quan and Xiao, Ted}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {3705--3728}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/li25c/li25c.pdf}, url = {https://proceedings.mlr.press/v270/li25c.html}, abstract = {The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, issues that are likely to worsen as policies broaden the spectrum of tasks they can perform. In this work, we demonstrate that simulation-based evaluation can be a scalable, reproducible, and reliable proxy for real-world evaluation. We identify control and visual disparities between real and simulated environments as key challenges for reliable simulated evaluation and propose approaches for mitigating these gaps without needing to painstakingly craft full-fidelity digital twins. We then employ these techniques to create SIMPLER, a collection of simulated environments for policy evaluation on common real robot manipulation setups. Through over 1500 paired sim-and-real evaluations of manipulation policies across two embodiments and eight task families, we demonstrate strong correlation between policy performance in SIMPLER environments and that in the real world. Beyond aggregated trends, we find that SIMPLER evaluations effectively reflect the real-world behaviors of individual policies, such as sensitivity to various distribution shifts. We are committed to open-sourcing all SIMPLER environments along with our workflow for creating new environments to facilitate research on general-purpose manipulation policies and simulated evaluation frameworks. Website: https://simpler-env.github.io/} }
Endnote
%0 Conference Paper %T Evaluating Real-World Robot Manipulation Policies in Simulation %A Xuanlin Li %A Kyle Hsu %A Jiayuan Gu %A Oier Mees %A Karl Pertsch %A Homer Rich Walke %A Chuyuan Fu %A Ishikaa Lunawat %A Isabel Sieh %A Sean Kirmani %A Sergey Levine %A Jiajun Wu %A Chelsea Finn %A Hao Su %A Quan Vuong %A Ted Xiao %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-li25c %I PMLR %P 3705--3728 %U https://proceedings.mlr.press/v270/li25c.html %V 270 %X The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, issues that are likely to worsen as policies broaden the spectrum of tasks they can perform. In this work, we demonstrate that simulation-based evaluation can be a scalable, reproducible, and reliable proxy for real-world evaluation. We identify control and visual disparities between real and simulated environments as key challenges for reliable simulated evaluation and propose approaches for mitigating these gaps without needing to painstakingly craft full-fidelity digital twins. We then employ these techniques to create SIMPLER, a collection of simulated environments for policy evaluation on common real robot manipulation setups. Through over 1500 paired sim-and-real evaluations of manipulation policies across two embodiments and eight task families, we demonstrate strong correlation between policy performance in SIMPLER environments and that in the real world. Beyond aggregated trends, we find that SIMPLER evaluations effectively reflect the real-world behaviors of individual policies, such as sensitivity to various distribution shifts. We are committed to open-sourcing all SIMPLER environments along with our workflow for creating new environments to facilitate research on general-purpose manipulation policies and simulated evaluation frameworks. Website: https://simpler-env.github.io/
APA
Li, X., Hsu, K., Gu, J., Mees, O., Pertsch, K., Walke, H.R., Fu, C., Lunawat, I., Sieh, I., Kirmani, S., Levine, S., Wu, J., Finn, C., Su, H., Vuong, Q. & Xiao, T.. (2025). Evaluating Real-World Robot Manipulation Policies in Simulation. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3705-3728 Available from https://proceedings.mlr.press/v270/li25c.html.

Related Material