Multi-site Benchmarking of Deep Learning Models for Intraparenchymal Hemorrhage Segmentation on NCCT

Kauê T N Duarte; Abhijot S Sidhu; Murilo C Barros; Taha Aslan; Donghao Zhang; Jianhai Zhang; Devansh Bhatt; Brij Karmur; Mohamed AlShamrani; Wu Qiu; Aravind Ganesh; Bijoy K Menon

Multi-site Benchmarking of Deep Learning Models for Intraparenchymal Hemorrhage Segmentation on NCCT

Kauê T N Duarte, Abhijot S Sidhu, Murilo C Barros, Taha Aslan, Donghao Zhang, Jianhai Zhang, Devansh Bhatt, Brij Karmur, Mohamed AlShamrani, Wu Qiu, Aravind Ganesh, Bijoy K Menon

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4664-4682, 2026.

Abstract

Intraparenchymal hemorrhage (IPH) is a critical and often fatal subtype of hemorrhagic stroke, requiring rapid and accurate diagnosis on non-contrast computed tomography (NCCT) scans for effective treatment. While deep learning (DL) models, particularly convolutional neural networks (CNNs), offer potential for automating IPH segmentation, their real-world clinical utility is often limited by the lack of explicit data integration across diverse hospital sites with varying imaging protocols. This study conducted a multi-site benchmarking of black{five} prominent CNN architectures: baseline U-Net, Attention U-Net, Feature Pyramid Network (FPN), black{Swin U-Net}, and Trans U-Net, for IPH segmentation on a heterogeneous dataset from 17 clinical sites. Models were rigorously evaluated using F-measure (a.k.a., Dice), Intersection over Union (IoU), and 95% Hausdorff Distance ($d_{H95}$). The advanced CNN variants (Attention U-Net, FPN, Trans U-Net) significantly outperformed the baseline U-Net in F-measure and IoU (e.g., FPN F-measure: $0.868$ vs. U-Net: $0.819$, $p<0.001$), with no significant difference among them. For boundary error, FPN reduced $d_{H95}$ compared to the baseline, whereas Trans U-Net showed improvement, though it was not significant. These models exhibited robust cross-site generalization across hemorrhage volumes, with minimal site-specific effects on performance. This study demonstrates that advanced CNN variants can be adopted for IPH segmentation to standardize and potentially accelerate IPH diagnosis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-duarte26a,
  title = 	 {Multi-site Benchmarking of Deep Learning Models for Intraparenchymal Hemorrhage Segmentation on NCCT},
  author =       {Duarte, Kau{\^e} T N and Sidhu, Abhijot S and Barros, Murilo C and Aslan, Taha and Zhang, Donghao and Zhang, Jianhai and Bhatt, Devansh and Karmur, Brij and AlShamrani, Mohamed and Qiu, Wu and Ganesh, Aravind and K Menon, Bijoy},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {4664--4682},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/duarte26a/duarte26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/duarte26a.html},
  abstract = 	 {Intraparenchymal hemorrhage (IPH) is a critical and often fatal subtype of hemorrhagic stroke, requiring rapid and accurate diagnosis on non-contrast computed tomography (NCCT) scans for effective treatment. While deep learning (DL) models, particularly convolutional neural networks (CNNs), offer potential for automating IPH segmentation, their real-world clinical utility is often limited by the lack of explicit data integration across diverse hospital sites with varying imaging protocols. This study conducted a multi-site benchmarking of black{five} prominent CNN architectures: baseline U-Net, Attention U-Net, Feature Pyramid Network (FPN), black{Swin U-Net}, and Trans U-Net, for IPH segmentation on a heterogeneous dataset from 17 clinical sites. Models were rigorously evaluated using F-measure (a.k.a., Dice), Intersection over Union (IoU), and 95% Hausdorff Distance ($d_{H95}$). The advanced CNN variants (Attention U-Net, FPN, Trans U-Net) significantly outperformed the baseline U-Net in F-measure and IoU (e.g., FPN F-measure: $0.868$ vs. U-Net: $0.819$, $p<0.001$), with no significant difference among them. For boundary error, FPN reduced $d_{H95}$ compared to the baseline, whereas Trans U-Net showed improvement, though it was not significant. These models exhibited robust cross-site generalization across hemorrhage volumes, with minimal site-specific effects on performance. This study demonstrates that advanced CNN variants can be adopted for IPH segmentation to standardize and potentially accelerate IPH diagnosis.}
}

Endnote

%0 Conference Paper
%T Multi-site Benchmarking of Deep Learning Models for Intraparenchymal Hemorrhage Segmentation on NCCT
%A Kauê T N Duarte
%A Abhijot S Sidhu
%A Murilo C Barros
%A Taha Aslan
%A Donghao Zhang
%A Jianhai Zhang
%A Devansh Bhatt
%A Brij Karmur
%A Mohamed AlShamrani
%A Wu Qiu
%A Aravind Ganesh
%A Bijoy K Menon
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-duarte26a
%I PMLR
%P 4664--4682
%U https://proceedings.mlr.press/v315/duarte26a.html
%V 315
%X Intraparenchymal hemorrhage (IPH) is a critical and often fatal subtype of hemorrhagic stroke, requiring rapid and accurate diagnosis on non-contrast computed tomography (NCCT) scans for effective treatment. While deep learning (DL) models, particularly convolutional neural networks (CNNs), offer potential for automating IPH segmentation, their real-world clinical utility is often limited by the lack of explicit data integration across diverse hospital sites with varying imaging protocols. This study conducted a multi-site benchmarking of black{five} prominent CNN architectures: baseline U-Net, Attention U-Net, Feature Pyramid Network (FPN), black{Swin U-Net}, and Trans U-Net, for IPH segmentation on a heterogeneous dataset from 17 clinical sites. Models were rigorously evaluated using F-measure (a.k.a., Dice), Intersection over Union (IoU), and 95% Hausdorff Distance ($d_{H95}$). The advanced CNN variants (Attention U-Net, FPN, Trans U-Net) significantly outperformed the baseline U-Net in F-measure and IoU (e.g., FPN F-measure: $0.868$ vs. U-Net: $0.819$, $p<0.001$), with no significant difference among them. For boundary error, FPN reduced $d_{H95}$ compared to the baseline, whereas Trans U-Net showed improvement, though it was not significant. These models exhibited robust cross-site generalization across hemorrhage volumes, with minimal site-specific effects on performance. This study demonstrates that advanced CNN variants can be adopted for IPH segmentation to standardize and potentially accelerate IPH diagnosis.

APA

Duarte, K.T.N., Sidhu, A.S., Barros, M.C., Aslan, T., Zhang, D., Zhang, J., Bhatt, D., Karmur, B., AlShamrani, M., Qiu, W., Ganesh, A. & K Menon, B.. (2026). Multi-site Benchmarking of Deep Learning Models for Intraparenchymal Hemorrhage Segmentation on NCCT. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:4664-4682 Available from https://proceedings.mlr.press/v315/duarte26a.html.

Related Material

Download PDF