Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping

Pinhao Song; Pengteng Li; Renaud Detry

Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping

Pinhao Song, Pengteng Li, Renaud Detry

Proceedings of The 8th Conference on Robot Learning, PMLR 270:2948-2964, 2025.

Abstract

There are two dominant approaches in modern robot grasp planning: dense prediction and sampling-based methods. Dense prediction calculates viable grasps across the robot’s view but is limited to predicting one grasp per voxel. Sampling-based methods, on the other hand, encode multi-modal grasp distributions, allowing for different grasp approaches at a point. However, these methods rely on a global latent representation, which struggles to represent the entire field of view, resulting in coarse grasps. To address this, we introduce \emph{Implicit Grasp Diffusion} (IGD), which combines the strengths of both methods by using implicit neural representations to extract detailed local features and sampling grasps from diffusion models conditioned on these features. Evaluations on clutter removal tasks in both simulated and real-world environments show that IGD delivers high accuracy, noise resilience, and multi-modal grasp pose capabilities.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-song25b,
  title = 	 {Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping},
  author =       {Song, Pinhao and Li, Pengteng and Detry, Renaud},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {2948--2964},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/song25b/song25b.pdf},
  url = 	 {https://proceedings.mlr.press/v270/song25b.html},
  abstract = 	 {There are two dominant approaches in modern robot grasp planning: dense prediction and sampling-based methods. Dense prediction calculates viable grasps across the robot’s view but is limited to predicting one grasp per voxel. Sampling-based methods, on the other hand, encode multi-modal grasp distributions, allowing for different grasp approaches at a point. However, these methods rely on a global latent representation, which struggles to represent the entire field of view, resulting in coarse grasps. To address this, we introduce \emph{Implicit Grasp Diffusion} (IGD), which combines the strengths of both methods by using implicit neural representations to extract detailed local features and sampling grasps from diffusion models conditioned on these features. Evaluations on clutter removal tasks in both simulated and real-world environments show that IGD delivers high accuracy, noise resilience, and multi-modal grasp pose capabilities.}
}

Endnote

%0 Conference Paper
%T Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping
%A Pinhao Song
%A Pengteng Li
%A Renaud Detry
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-song25b
%I PMLR
%P 2948--2964
%U https://proceedings.mlr.press/v270/song25b.html
%V 270
%X There are two dominant approaches in modern robot grasp planning: dense prediction and sampling-based methods. Dense prediction calculates viable grasps across the robot’s view but is limited to predicting one grasp per voxel. Sampling-based methods, on the other hand, encode multi-modal grasp distributions, allowing for different grasp approaches at a point. However, these methods rely on a global latent representation, which struggles to represent the entire field of view, resulting in coarse grasps. To address this, we introduce \emph{Implicit Grasp Diffusion} (IGD), which combines the strengths of both methods by using implicit neural representations to extract detailed local features and sampling grasps from diffusion models conditioned on these features. Evaluations on clutter removal tasks in both simulated and real-world environments show that IGD delivers high accuracy, noise resilience, and multi-modal grasp pose capabilities.

APA

Song, P., Li, P. & Detry, R.. (2025). Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:2948-2964 Available from https://proceedings.mlr.press/v270/song25b.html.

Implicit Grasp Diffusion: Bridging the Gap between Dense Prediction and Sampling-based Grasping

Abstract

Cite this Paper

Related Material