UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

Zhaodong Jiang; Ashish Sinha; Tongtong Cao; Yuan Ren; Bingbing Liu; Binbin Xu

UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

Zhaodong Jiang, Ashish Sinha, Tongtong Cao, Yuan Ren, Bingbing Liu, Binbin Xu

Proceedings of The 9th Conference on Robot Learning, PMLR 305:3589-3604, 2025.

Abstract

Estimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models. However, acquiring such models can be costly and impractical. Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose $\textit{UnPose}$, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model. Specifically, starting from a single-view RGB-D frame, $\textit{UnPose}$ uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model’s uncertainty, thereby, continuously improving the pose estimation accuracy and 3D reconstruction quality. To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that $\textit{UnPose}$ significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-jiang25d,
  title = 	 {UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation},
  author =       {Jiang, Zhaodong and Sinha, Ashish and Cao, Tongtong and Ren, Yuan and Liu, Bingbing and Xu, Binbin},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {3589--3604},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/jiang25d/jiang25d.pdf},
  url = 	 {https://proceedings.mlr.press/v305/jiang25d.html},
  abstract = 	 {Estimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models.  However, acquiring such models can be costly and impractical.  Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose $\textit{UnPose}$, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model.  Specifically, starting from a single-view RGB-D frame, $\textit{UnPose}$ uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model’s uncertainty, thereby, continuously improving the pose estimation accuracy and 3D reconstruction quality.  To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that $\textit{UnPose}$ significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.}
}

Endnote

%0 Conference Paper
%T UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation
%A Zhaodong Jiang
%A Ashish Sinha
%A Tongtong Cao
%A Yuan Ren
%A Bingbing Liu
%A Binbin Xu
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-jiang25d
%I PMLR
%P 3589--3604
%U https://proceedings.mlr.press/v305/jiang25d.html
%V 305
%X Estimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models.  However, acquiring such models can be costly and impractical.  Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose $\textit{UnPose}$, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model.  Specifically, starting from a single-view RGB-D frame, $\textit{UnPose}$ uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model’s uncertainty, thereby, continuously improving the pose estimation accuracy and 3D reconstruction quality.  To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that $\textit{UnPose}$ significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.

APA

Jiang, Z., Sinha, A., Cao, T., Ren, Y., Liu, B. & Xu, B.. (2025). UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3589-3604 Available from https://proceedings.mlr.press/v305/jiang25d.html.

UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

Abstract

Cite this Paper

Related Material