Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization

Ahmed Tahiru Issah; Idaya Seidu; Carine Mukamakuza

Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization

Ahmed Tahiru Issah, Idaya Seidu, Carine Mukamakuza

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4683-4702, 2026.

Abstract

Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-issah26a,
  title = 	 {Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization},
  author =       {Issah, Ahmed Tahiru and Seidu, Idaya and Mukamakuza, Carine},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {4683--4702},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/issah26a/issah26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/issah26a.html},
  abstract = 	 {Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.}
}

Endnote

%0 Conference Paper
%T Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization
%A Ahmed Tahiru Issah
%A Idaya Seidu
%A Carine Mukamakuza
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-issah26a
%I PMLR
%P 4683--4702
%U https://proceedings.mlr.press/v315/issah26a.html
%V 315
%X Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.

APA

Issah, A.T., Seidu, I. & Mukamakuza, C.. (2026). Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:4683-4702 Available from https://proceedings.mlr.press/v315/issah26a.html.

Related Material

Download PDF