A Comprehensive Benchmarking and Systematic Analysis of Deep Learning Models for Sonomammogram Segmentation

Malitha Gunawardhana; Norbert Zolek

A Comprehensive Benchmarking and Systematic Analysis of Deep Learning Models for Sonomammogram Segmentation

Malitha Gunawardhana, Norbert Zolek

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4342-4355, 2026.

Abstract

Accurate segmentation of breast lesions in sonomammograms supports computer assisted diagnosis and early breast cancer detection. Existing public ultrasound datasets contain duplicates, mislabeled cases, and non-breast images, which leads to unreliable model evaluation. To address this, we construct a curated multi-centre dataset of 3,494 images with expert-verified annotations and patient-level splits. Using this dataset, we define a unified benchmarking protocol and evaluate eleven representative architectures, including nnU Net variants, SegResNet, SwinUNETR, U Mamba, and SAMed. All models are trained and assessed under identical preprocessing, training, and evaluation settings. Performance is measured with Dice, Sensitivity, Specificity, Accuracy, and Hausdorff Distance metrics. We also analyse how loss function choice and training data volume influence performance. SAMed p512 obtains the best Dice score at 0.860 $\pm$ 0.141 and the lowest Hausdorff Distance at 3.896 $\pm$ 5.472. The benchmark provides a reproducible reference for breast ultrasound segmentation and clarifies how architecture design and data-related factors shape performance in this setting.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-gunawardhana26a,
  title = 	 {A Comprehensive Benchmarking and Systematic Analysis of Deep Learning Models for Sonomammogram Segmentation},
  author =       {Gunawardhana, Malitha and Zolek, Norbert},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {4342--4355},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/gunawardhana26a/gunawardhana26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/gunawardhana26a.html},
  abstract = 	 {Accurate segmentation of breast lesions in sonomammograms supports computer assisted diagnosis and early breast cancer detection. Existing public ultrasound datasets contain duplicates, mislabeled cases, and non-breast images, which leads to unreliable model evaluation. To address this, we construct a curated multi-centre dataset of 3,494 images with expert-verified annotations and patient-level splits. Using this dataset, we define a unified benchmarking protocol and evaluate eleven representative architectures, including nnU Net variants, SegResNet, SwinUNETR, U Mamba, and SAMed. All models are trained and assessed under identical preprocessing, training, and evaluation settings. Performance is measured with Dice, Sensitivity, Specificity, Accuracy, and Hausdorff Distance metrics. We also analyse how loss function choice and training data volume influence performance. SAMed p512 obtains the best Dice score at 0.860 $\pm$ 0.141 and the lowest Hausdorff Distance at 3.896 $\pm$ 5.472. The benchmark provides a reproducible reference for breast ultrasound segmentation and clarifies how architecture design and data-related factors shape performance in this setting.}
}

Endnote

%0 Conference Paper
%T A Comprehensive Benchmarking and Systematic Analysis of Deep Learning Models for Sonomammogram Segmentation
%A Malitha Gunawardhana
%A Norbert Zolek
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-gunawardhana26a
%I PMLR
%P 4342--4355
%U https://proceedings.mlr.press/v315/gunawardhana26a.html
%V 315
%X Accurate segmentation of breast lesions in sonomammograms supports computer assisted diagnosis and early breast cancer detection. Existing public ultrasound datasets contain duplicates, mislabeled cases, and non-breast images, which leads to unreliable model evaluation. To address this, we construct a curated multi-centre dataset of 3,494 images with expert-verified annotations and patient-level splits. Using this dataset, we define a unified benchmarking protocol and evaluate eleven representative architectures, including nnU Net variants, SegResNet, SwinUNETR, U Mamba, and SAMed. All models are trained and assessed under identical preprocessing, training, and evaluation settings. Performance is measured with Dice, Sensitivity, Specificity, Accuracy, and Hausdorff Distance metrics. We also analyse how loss function choice and training data volume influence performance. SAMed p512 obtains the best Dice score at 0.860 $\pm$ 0.141 and the lowest Hausdorff Distance at 3.896 $\pm$ 5.472. The benchmark provides a reproducible reference for breast ultrasound segmentation and clarifies how architecture design and data-related factors shape performance in this setting.

APA

Gunawardhana, M. & Zolek, N.. (2026). A Comprehensive Benchmarking and Systematic Analysis of Deep Learning Models for Sonomammogram Segmentation. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:4342-4355 Available from https://proceedings.mlr.press/v315/gunawardhana26a.html.

Related Material

Download PDF