Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization

Ahmed Tahiru Issah, Idaya Seidu, Carine Mukamakuza
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4683-4702, 2026.

Abstract

Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-issah26a, title = {Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization}, author = {Issah, Ahmed Tahiru and Seidu, Idaya and Mukamakuza, Carine}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {4683--4702}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/issah26a/issah26a.pdf}, url = {https://proceedings.mlr.press/v315/issah26a.html}, abstract = {Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.} }
Endnote
%0 Conference Paper %T Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization %A Ahmed Tahiru Issah %A Idaya Seidu %A Carine Mukamakuza %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-issah26a %I PMLR %P 4683--4702 %U https://proceedings.mlr.press/v315/issah26a.html %V 315 %X Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.
APA
Issah, A.T., Seidu, I. & Mukamakuza, C.. (2026). Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:4683-4702 Available from https://proceedings.mlr.press/v315/issah26a.html.

Related Material