GraspSplats: Efficient Manipulation with 3D Feature Splatting

Mazeyu Ji; Ri-Zhao Qiu; Xueyan Zou; Xiaolong Wang

GraspSplats: Efficient Manipulation with 3D Feature Splatting

Mazeyu Ji, Ri-Zhao Qiu, Xueyan Zou, Xiaolong Wang

Proceedings of The 8th Conference on Robot Learning, PMLR 270:1443-1460, 2025.

Abstract

The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to its implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats can generate high-quality scene representations under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods. The code will be released.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-ji25a,
  title = 	 {GraspSplats: Efficient Manipulation with 3D Feature Splatting},
  author =       {Ji, Mazeyu and Qiu, Ri-Zhao and Zou, Xueyan and Wang, Xiaolong},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {1443--1460},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/ji25a/ji25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/ji25a.html},
  abstract = 	 {The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to its implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats can generate high-quality scene representations under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods. The code will be released.}
}

Endnote

%0 Conference Paper
%T GraspSplats: Efficient Manipulation with 3D Feature Splatting
%A Mazeyu Ji
%A Ri-Zhao Qiu
%A Xueyan Zou
%A Xiaolong Wang
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-ji25a
%I PMLR
%P 1443--1460
%U https://proceedings.mlr.press/v270/ji25a.html
%V 270
%X The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to its implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats can generate high-quality scene representations under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods. The code will be released.

APA

Ji, M., Qiu, R., Zou, X. & Wang, X.. (2025). GraspSplats: Efficient Manipulation with 3D Feature Splatting. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1443-1460 Available from https://proceedings.mlr.press/v270/ji25a.html.

GraspSplats: Efficient Manipulation with 3D Feature Splatting

Abstract

Cite this Paper

Related Material