SAMURAI: A Two-Stage Foundation Model Pipeline for Robust Optic Nerve Head Segmentation in Fundus Images

Carlos Perez; Neeru Gupta; Ipek Oruc

SAMURAI: A Two-Stage Foundation Model Pipeline for Robust Optic Nerve Head Segmentation in Fundus Images

Carlos Perez, Neeru Gupta, Ipek Oruc

Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:909-915, 2026.

Abstract

Accurate segmentation of the optic nerve head (ONH) is essential for automated glaucoma assessment using the Cup-to-Disc Ratio (CDR). However, conventional convolutional neural networks (CNNs) often exhibit performance degradation under domain shift caused by variations in fundus imaging devices and protocols. Foundation models offer a potential solution due to their large-scale pre-training and intrinsic feature invariance. While the Segment Anything Model (SAM) offers a robust alternative, recent adaptations have resorted to complex, task-specific architectural modifications to handle retinal geometry. In this paper, we propose SAMURAI, a two-stage foundation model pipeline that combines a YOLOv12x-based ONH localizer with a minimally adapted MedSAM foundation model. We rigorously evaluate this supervised baseline against exploratory variants incorporating geometric inductive biases (polar transformations) and semi-supervised learning (SSL). On the REFUGE benchmark, our simplified approach establishes a new state-of-the-art, achieving an Optic Cup Dice of 0.920, significantly outperforming specialized models like FunduSAM (0.867). Furthermore, our ablation study reveals that additional architectural complexity does not confer measurable performance gains over the foundation baseline. These findings suggest that large-scale pre-trained foundation models provide sufficient robustness for ONH segmentation without task-specific architectural modifications.

Cite this Paper

BibTeX

@InProceedings{pmlr-v318-perez26a,
  title = 	 {SAMURAI: A Two-Stage Foundation Model Pipeline for Robust Optic Nerve Head Segmentation in Fundus Images},
  author =       {Perez, Carlos and Gupta, Neeru and Oruc, Ipek},
  booktitle = 	 {Proceedings of the The 39th Canadian Conference on Artificial Intelligence},
  pages = 	 {909--915},
  year = 	 {2026},
  editor = 	 {Bouzar-Benlabiod, Lydia and Leung, Carson},
  volume = 	 {318},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--29 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v318/main/assets/perez26a/perez26a.pdf},
  url = 	 {https://proceedings.mlr.press/v318/perez26a.html},
  abstract = 	 {Accurate segmentation of the optic nerve head (ONH) is essential for automated glaucoma assessment using the Cup-to-Disc Ratio (CDR). However, conventional convolutional neural networks (CNNs) often exhibit performance degradation under domain shift caused by variations in fundus imaging devices and protocols. Foundation models offer a potential solution due to their large-scale pre-training and intrinsic feature invariance. While the Segment Anything Model (SAM) offers a robust alternative, recent adaptations have resorted to complex, task-specific architectural modifications to handle retinal geometry. In this paper, we propose SAMURAI, a two-stage foundation model pipeline that combines a YOLOv12x-based ONH localizer with a minimally adapted MedSAM foundation model. We rigorously evaluate this supervised baseline against exploratory variants incorporating geometric inductive biases (polar transformations) and semi-supervised learning (SSL). On the REFUGE benchmark, our simplified approach establishes a new state-of-the-art, achieving an Optic Cup Dice of 0.920, significantly outperforming specialized models like FunduSAM (0.867). Furthermore, our ablation study reveals that additional architectural complexity does not confer measurable performance gains over the foundation baseline. These findings suggest that large-scale pre-trained foundation models provide sufficient robustness for ONH segmentation without task-specific architectural modifications.}
}

Endnote

%0 Conference Paper
%T SAMURAI: A Two-Stage Foundation Model Pipeline for Robust Optic Nerve Head Segmentation in Fundus Images
%A Carlos Perez
%A Neeru Gupta
%A Ipek Oruc
%B Proceedings of the The 39th Canadian Conference on Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2026
%E Lydia Bouzar-Benlabiod
%E Carson Leung	
%F pmlr-v318-perez26a
%I PMLR
%P 909--915
%U https://proceedings.mlr.press/v318/perez26a.html
%V 318
%X Accurate segmentation of the optic nerve head (ONH) is essential for automated glaucoma assessment using the Cup-to-Disc Ratio (CDR). However, conventional convolutional neural networks (CNNs) often exhibit performance degradation under domain shift caused by variations in fundus imaging devices and protocols. Foundation models offer a potential solution due to their large-scale pre-training and intrinsic feature invariance. While the Segment Anything Model (SAM) offers a robust alternative, recent adaptations have resorted to complex, task-specific architectural modifications to handle retinal geometry. In this paper, we propose SAMURAI, a two-stage foundation model pipeline that combines a YOLOv12x-based ONH localizer with a minimally adapted MedSAM foundation model. We rigorously evaluate this supervised baseline against exploratory variants incorporating geometric inductive biases (polar transformations) and semi-supervised learning (SSL). On the REFUGE benchmark, our simplified approach establishes a new state-of-the-art, achieving an Optic Cup Dice of 0.920, significantly outperforming specialized models like FunduSAM (0.867). Furthermore, our ablation study reveals that additional architectural complexity does not confer measurable performance gains over the foundation baseline. These findings suggest that large-scale pre-trained foundation models provide sufficient robustness for ONH segmentation without task-specific architectural modifications.

APA

Perez, C., Gupta, N. & Oruc, I.. (2026). SAMURAI: A Two-Stage Foundation Model Pipeline for Robust Optic Nerve Head Segmentation in Fundus Images. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:909-915 Available from https://proceedings.mlr.press/v318/perez26a.html.

Related Material

Download PDF