SEG4SEG: Identifying Systematic Failure Modes in Segmentation by Subgroup Discovery Methods

Nina Weng, Eike Petersen, Alceu Bissoto, Susu Sun, Lisa M. Koch, Aasa Feragen, Siavash Bigdeli, Christian F. Baumgartner
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1983-2002, 2026.

Abstract

Deep learning models for medical image segmentation can achieve high overall performance but fail systematically on critical subgroups. While Slice Discovery Methods (SDM) have shown promise in revealing classification failures, their effectiveness for segmentation remains unexplored. Moreover, although various systematic failures have been reported in segmentation tasks, no prior work has systematically categorized them. In this work, we address both gaps. First, we categorize potential sources of systematic errors in medical image segmentation. Second, we empirically investigate whether SDMs can identify problematic slices in each of those categories without manual annotations. Our evaluation covers four controlled failure types and two real-world failure cases, using medical imaging datasets and explicit success criteria for SDM evaluation. Our experiments show that SDMs adapted for segmentation can identify systematic errors, demonstrating their potential for failure analysis in medical imaging.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-weng26a, title = {SEG4SEG: Identifying Systematic Failure Modes in Segmentation by Subgroup Discovery Methods}, author = {Weng, Nina and Petersen, Eike and Bissoto, Alceu and Sun, Susu and Koch, Lisa M. and Feragen, Aasa and Bigdeli, Siavash and Baumgartner, Christian F.}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {1983--2002}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/weng26a/weng26a.pdf}, url = {https://proceedings.mlr.press/v315/weng26a.html}, abstract = {Deep learning models for medical image segmentation can achieve high overall performance but fail systematically on critical subgroups. While Slice Discovery Methods (SDM) have shown promise in revealing classification failures, their effectiveness for segmentation remains unexplored. Moreover, although various systematic failures have been reported in segmentation tasks, no prior work has systematically categorized them. In this work, we address both gaps. First, we categorize potential sources of systematic errors in medical image segmentation. Second, we empirically investigate whether SDMs can identify problematic slices in each of those categories without manual annotations. Our evaluation covers four controlled failure types and two real-world failure cases, using medical imaging datasets and explicit success criteria for SDM evaluation. Our experiments show that SDMs adapted for segmentation can identify systematic errors, demonstrating their potential for failure analysis in medical imaging.} }
Endnote
%0 Conference Paper %T SEG4SEG: Identifying Systematic Failure Modes in Segmentation by Subgroup Discovery Methods %A Nina Weng %A Eike Petersen %A Alceu Bissoto %A Susu Sun %A Lisa M. Koch %A Aasa Feragen %A Siavash Bigdeli %A Christian F. Baumgartner %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-weng26a %I PMLR %P 1983--2002 %U https://proceedings.mlr.press/v315/weng26a.html %V 315 %X Deep learning models for medical image segmentation can achieve high overall performance but fail systematically on critical subgroups. While Slice Discovery Methods (SDM) have shown promise in revealing classification failures, their effectiveness for segmentation remains unexplored. Moreover, although various systematic failures have been reported in segmentation tasks, no prior work has systematically categorized them. In this work, we address both gaps. First, we categorize potential sources of systematic errors in medical image segmentation. Second, we empirically investigate whether SDMs can identify problematic slices in each of those categories without manual annotations. Our evaluation covers four controlled failure types and two real-world failure cases, using medical imaging datasets and explicit success criteria for SDM evaluation. Our experiments show that SDMs adapted for segmentation can identify systematic errors, demonstrating their potential for failure analysis in medical imaging.
APA
Weng, N., Petersen, E., Bissoto, A., Sun, S., Koch, L.M., Feragen, A., Bigdeli, S. & Baumgartner, C.F.. (2026). SEG4SEG: Identifying Systematic Failure Modes in Segmentation by Subgroup Discovery Methods. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:1983-2002 Available from https://proceedings.mlr.press/v315/weng26a.html.

Related Material