Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning

Albert Wilcox, Mohamed Ghanem, Masoud Moghani, Pierre Barroso, Benjamin Joffe, Animesh Garg
Proceedings of The 9th Conference on Robot Learning, PMLR 305:1486-1514, 2025.

Abstract

Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution. 3D scene representations that incorporate observations from calibrated RGBD cameras have been proposed as a way to mitigate this, but in our evaluations with unseen embodiments and camera viewpoints they show only modest improvement. To address those challenges, we propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms. The key idea is to use a pretrained 2D backbone to extract semantic information, using 3D only as a medium to localize this information with respect to the end-effector. We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms’ learning capacity while enabling zero-shot transfer to novel embodiments and camera poses. For more results, visit https://adapt3r-robot.github.io.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-wilcox25a, title = {Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning}, author = {Wilcox, Albert and Ghanem, Mohamed and Moghani, Masoud and Barroso, Pierre and Joffe, Benjamin and Garg, Animesh}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {1486--1514}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/wilcox25a/wilcox25a.pdf}, url = {https://proceedings.mlr.press/v305/wilcox25a.html}, abstract = {Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution. 3D scene representations that incorporate observations from calibrated RGBD cameras have been proposed as a way to mitigate this, but in our evaluations with unseen embodiments and camera viewpoints they show only modest improvement. To address those challenges, we propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms. The key idea is to use a pretrained 2D backbone to extract semantic information, using 3D only as a medium to localize this information with respect to the end-effector. We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms’ learning capacity while enabling zero-shot transfer to novel embodiments and camera poses. For more results, visit https://adapt3r-robot.github.io.} }
Endnote
%0 Conference Paper %T Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning %A Albert Wilcox %A Mohamed Ghanem %A Masoud Moghani %A Pierre Barroso %A Benjamin Joffe %A Animesh Garg %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-wilcox25a %I PMLR %P 1486--1514 %U https://proceedings.mlr.press/v305/wilcox25a.html %V 305 %X Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution. 3D scene representations that incorporate observations from calibrated RGBD cameras have been proposed as a way to mitigate this, but in our evaluations with unseen embodiments and camera viewpoints they show only modest improvement. To address those challenges, we propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms. The key idea is to use a pretrained 2D backbone to extract semantic information, using 3D only as a medium to localize this information with respect to the end-effector. We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms’ learning capacity while enabling zero-shot transfer to novel embodiments and camera poses. For more results, visit https://adapt3r-robot.github.io.
APA
Wilcox, A., Ghanem, M., Moghani, M., Barroso, P., Joffe, B. & Garg, A.. (2025). Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:1486-1514 Available from https://proceedings.mlr.press/v305/wilcox25a.html.

Related Material