VFMStitch: A Vision-Foundation-Model Empowered Framework for 3D Ultrasound Stitching via Geometric–Semantic Feature Fusion

Xing Yao; Nick DiSanto; Runxuan Yu; Jiacheng Wang; Daiwei Lu; Gabriel Arenas; Baris Oguz; Alison Pouch; Nadav Schwartz; Brett C Byram; Ipek Oguz

VFMStitch: A Vision-Foundation-Model Empowered Framework for 3D Ultrasound Stitching via Geometric–Semantic Feature Fusion

Xing Yao, Nick DiSanto, Runxuan Yu, Jiacheng Wang, Daiwei Lu, Gabriel Arenas, Baris Oguz, Alison Pouch, Nadav Schwartz, Brett C Byram, Ipek Oguz

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:2621-2639, 2026.

Abstract

3D ultrasound (3DUS) stitching expands the field-of-view (FOV) by registering partially overlapping 3DUS volumes acquired from different probe positions. This task is intrinsically difficult due to large inter-volume translations and rotations, the impact of the sector-shaped FOV, as well as the heavy noise and artifacts inherent to ultrasound. With the rapid progress of Vision Foundation Models (VFMs) such as DINOv3, VFM-derived features have recently shown promise for downstream medical image registration tasks. However, existing VFM-based approaches primarily focus on deformable registration and are rarely evaluated for rigid alignment under large motions. Moreover, the feasibility of leveraging VFM-derived features for robust 3DUS stitching remains largely unexplored. In this study, we introduce VFMStitch, the first training-free, VFM-empowered 3DUS stitching framework that integrates point-cloud (PCD)–based geometric features with DINOv3-derived semantic descriptors. Extensive experiments demonstrate that VFMStitch substantially improves rigid registration accuracy compared to existing methods, validating the effectiveness of geometric–semantic fusion for challenging 3DUS stitching scenarios.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-yao26a,
  title = 	 {VFMStitch: A Vision-Foundation-Model Empowered Framework for 3D Ultrasound Stitching via Geometric–Semantic Feature Fusion},
  author =       {Yao, Xing and DiSanto, Nick and Yu, Runxuan and Wang, Jiacheng and Lu, Daiwei and Arenas, Gabriel and Oguz, Baris and Pouch, Alison and Schwartz, Nadav and Byram, Brett C and Oguz, Ipek},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {2621--2639},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/yao26a/yao26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/yao26a.html},
  abstract = 	 {3D ultrasound (3DUS) stitching expands the field-of-view (FOV) by registering partially overlapping 3DUS volumes acquired from different probe positions. This task is intrinsically difficult due to large inter-volume translations and rotations, the impact of the sector-shaped FOV, as well as the heavy noise and artifacts inherent to ultrasound. With the rapid progress of Vision Foundation Models (VFMs) such as DINOv3, VFM-derived features have recently shown promise for downstream medical image registration tasks. However, existing VFM-based approaches primarily focus on deformable registration and are rarely evaluated for rigid alignment under large motions. Moreover, the feasibility of leveraging VFM-derived features for robust 3DUS stitching remains largely unexplored. In this study, we introduce VFMStitch, the first training-free, VFM-empowered 3DUS stitching framework that integrates point-cloud (PCD)–based geometric features with DINOv3-derived semantic descriptors. Extensive experiments demonstrate that VFMStitch substantially improves rigid registration accuracy compared to existing methods, validating the effectiveness of geometric–semantic fusion for challenging 3DUS stitching scenarios.}
}

Endnote

%0 Conference Paper
%T VFMStitch: A Vision-Foundation-Model Empowered Framework for 3D Ultrasound Stitching via Geometric–Semantic Feature Fusion
%A Xing Yao
%A Nick DiSanto
%A Runxuan Yu
%A Jiacheng Wang
%A Daiwei Lu
%A Gabriel Arenas
%A Baris Oguz
%A Alison Pouch
%A Nadav Schwartz
%A Brett C Byram
%A Ipek Oguz
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-yao26a
%I PMLR
%P 2621--2639
%U https://proceedings.mlr.press/v315/yao26a.html
%V 315
%X 3D ultrasound (3DUS) stitching expands the field-of-view (FOV) by registering partially overlapping 3DUS volumes acquired from different probe positions. This task is intrinsically difficult due to large inter-volume translations and rotations, the impact of the sector-shaped FOV, as well as the heavy noise and artifacts inherent to ultrasound. With the rapid progress of Vision Foundation Models (VFMs) such as DINOv3, VFM-derived features have recently shown promise for downstream medical image registration tasks. However, existing VFM-based approaches primarily focus on deformable registration and are rarely evaluated for rigid alignment under large motions. Moreover, the feasibility of leveraging VFM-derived features for robust 3DUS stitching remains largely unexplored. In this study, we introduce VFMStitch, the first training-free, VFM-empowered 3DUS stitching framework that integrates point-cloud (PCD)–based geometric features with DINOv3-derived semantic descriptors. Extensive experiments demonstrate that VFMStitch substantially improves rigid registration accuracy compared to existing methods, validating the effectiveness of geometric–semantic fusion for challenging 3DUS stitching scenarios.

APA

Yao, X., DiSanto, N., Yu, R., Wang, J., Lu, D., Arenas, G., Oguz, B., Pouch, A., Schwartz, N., Byram, B.C. & Oguz, I.. (2026). VFMStitch: A Vision-Foundation-Model Empowered Framework for 3D Ultrasound Stitching via Geometric–Semantic Feature Fusion. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:2621-2639 Available from https://proceedings.mlr.press/v315/yao26a.html.

Related Material

Download PDF