PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

Shizhe Chen, Ricardo Garcia Pinel, Cordelia Schmid, Ivan Laptev
Proceedings of The 7th Conference on Robot Learning, PMLR 229:1761-1781, 2023.

Abstract

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

Cite this Paper


BibTeX
@InProceedings{pmlr-v229-chen23b, title = {PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation}, author = {Chen, Shizhe and Pinel, Ricardo Garcia and Schmid, Cordelia and Laptev, Ivan}, booktitle = {Proceedings of The 7th Conference on Robot Learning}, pages = {1761--1781}, year = {2023}, editor = {Tan, Jie and Toussaint, Marc and Darvish, Kourosh}, volume = {229}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v229/chen23b/chen23b.pdf}, url = {https://proceedings.mlr.press/v229/chen23b.html}, abstract = {The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.} }
Endnote
%0 Conference Paper %T PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation %A Shizhe Chen %A Ricardo Garcia Pinel %A Cordelia Schmid %A Ivan Laptev %B Proceedings of The 7th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Jie Tan %E Marc Toussaint %E Kourosh Darvish %F pmlr-v229-chen23b %I PMLR %P 1761--1781 %U https://proceedings.mlr.press/v229/chen23b.html %V 229 %X The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.
APA
Chen, S., Pinel, R.G., Schmid, C. & Laptev, I.. (2023). PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:1761-1781 Available from https://proceedings.mlr.press/v229/chen23b.html.

Related Material