[edit]
Detection versus Instance Segmentation for Multi-Species Malaria Diagnosis: A Head-to-Head Comparison and Multi-Dataset Validation of YOLOv12 Architectures with Small Object Optimization
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4683-4702, 2026.
Abstract
Automated malaria parasite detection using deep learning holds promise for addressing diagnostic gaps in resource-limited settings, yet most studies rely on single-dataset evaluations that fail to capture real-world variability. In this work, we rigorously validate YOLOv12-based architectures for malaria detection across diverse geographic and institutional contexts. We introduce a dual-head architecture combining instance segmentation with a high-resolution P2 detection head to target tiny ring-stage parasites. Our evaluation on a diverse Rwandan thick-smear dataset (2,739 images) and two external datasets from Ghana (Lacuna) and Nigeria (FASTMAL) reveals critical insights into model robustness. While the proposed YOLOv12-Seg-N-P2 model achieves state-of-the-art internal performance (mAP@50 $0.888$) and significantly improves detection of challenging P. vivax ($+10.9%$) and P. falciparum ring forms, external validation exposes severe domain shift, with performance dropping by $>80%$ on unseen datasets. We further demonstrate that while P2 heads enhance morphological precision on source data, they reduce zero-shot generalization, likely by overfitting to dataset-specific acquisition characteristics. We additionally evaluate white blood cell (WBC)-anchored stain normalization and pixel-scale rescaling as inference-time domain adaptation strategies. While WBC detection improves substantially (up to $+45%$ on Lacuna), P. falciparum detection remains critically low across both external datasets despite partial recovery on FASTMAL, confirming that preprocessing-based adaptation alone is insufficient for reliable cross-site parasite detection.