GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang
Proceedings of The 7th Conference on Robot Learning, PMLR 229:284-301, 2023.

Abstract

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot will need to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor, a visual behavior cloning agent for multi-task robotic manipulation with Generalizable Neural feature Fields. GNFactor jointly optimizes a neural radiance field (NeRF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module incorporates a vision-language foundation model (e.g., Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real-robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Project website: https://yanjieze.com/GNFactor/

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-ze23a, title = {GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields}, author = {Ze, Yanjie and Yan, Ge and Wu, Yueh-Hua and Macaluso, Annabella and Ge, Yuying and Ye, Jianglong and Hansen, Nicklas and Li, Li Erran and Wang, Xiaolong}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {284--301}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/ze23a/ze23a.pdf}, url = {https://proceedings.mlr.press/v229/ze23a.html}, abstract = {It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot will need to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor, a visual behavior cloning agent for multi-task robotic manipulation with Generalizable Neural feature Fields. GNFactor jointly optimizes a neural radiance field (NeRF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module incorporates a vision-language foundation model (e.g., Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real-robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Project website: https://yanjieze.com/GNFactor/} }
Endnote
%0 Conference Paper %T GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields %A Yanjie Ze %A Ge Yan %A Yueh-Hua Wu %A Annabella Macaluso %A Yuying Ge %A Jianglong Ye %A Nicklas Hansen %A Li Erran Li %A Xiaolong Wang %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-ze23a %I PMLR %P 284--301 %U https://proceedings.mlr.press/v229/ze23a.html %V 229 %X It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot will need to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor, a visual behavior cloning agent for multi-task robotic manipulation with Generalizable Neural feature Fields. GNFactor jointly optimizes a neural radiance field (NeRF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module incorporates a vision-language foundation model (e.g., Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real-robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Project website: https://yanjieze.com/GNFactor/
APA
Ze, Y., Yan, G., Wu, Y., Macaluso, A., Ge, Y., Ye, J., Hansen, N., Li, L.E. & Wang, X.. (2023). GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:284-301 Available from https://proceedings.mlr.press/v229/ze23a.html.

Related Material