Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies

Laurenz Adrian Heidrich; Aditya Rastogi; Priyank Upadhya; Gianluca Brugnara; Martha Foltyn-Dumitru; Benedikt Wiestler; Philipp Vollmuth

Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies

Laurenz Adrian Heidrich, Aditya Rastogi, Priyank Upadhya, Gianluca Brugnara, Martha Foltyn-Dumitru, Benedikt Wiestler, Philipp Vollmuth

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:615-638, 2026.

Abstract

Pathology detection in medical imaging is crucial for radiologists, yet current approaches that train specialized models for each region of interest often lack efficiency and robustness. Furthermore, the scarcity of annotated medical data, particularly for diverse phenotypes, poses significant challenges in achieving generalizability. To address these challenges, we present a novel language-guided object detection pipeline that leverages curriculum learning strategies, chosen for their ability to progressively train models on increasingly complex samples, thereby improving generalization across pathologies, phenotypes, and modalities. We developed a unified pipeline to convert segmentation datasets into bounding box annotations, and applied two curriculum learning approaches - teacher curriculum and bounding box size curriculum - to train a Grounding DINO model. Our method was evaluated on different tumor types in MRI and CT scans and showed significant improvements in detection accuracy. The teacher and bounding box size curriculum learning approaches yielded a 4.9% AP and 5.2% AP increase over baseline, respectively. The results highlight the potential of curriculum learning to optimize medical image analysis and clinical workflow. The code is available at https://github.com/CCI-Bonn/CL4OD.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-heidrich26a,
  title = 	 {Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies},
  author =       {Heidrich, Laurenz Adrian and Rastogi, Aditya and Upadhya, Priyank and Brugnara, Gianluca and Foltyn-Dumitru, Martha and Wiestler, Benedikt and Vollmuth, Philipp},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {615--638},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/heidrich26a/heidrich26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/heidrich26a.html},
  abstract = 	 {Pathology detection in medical imaging is crucial for radiologists, yet current approaches that train specialized models for each region of interest often lack efficiency and robustness. Furthermore, the scarcity of annotated medical data, particularly for diverse phenotypes, poses significant challenges in achieving generalizability. To address these challenges, we present a novel language-guided object detection pipeline that leverages curriculum learning strategies, chosen for their ability to progressively train models on increasingly complex samples, thereby improving generalization across pathologies, phenotypes, and modalities. We developed a unified pipeline to convert segmentation datasets into bounding box annotations, and applied two curriculum learning approaches - teacher curriculum and bounding box size curriculum - to train a Grounding DINO model. Our method was evaluated on different tumor types in MRI and CT scans and showed significant improvements in detection accuracy. The teacher and bounding box size curriculum learning approaches yielded a 4.9% AP and 5.2% AP increase over baseline, respectively. The results highlight the potential of curriculum learning to optimize medical image analysis and clinical workflow. The code is available at https://github.com/CCI-Bonn/CL4OD.}
}

Endnote

%0 Conference Paper
%T Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies
%A Laurenz Adrian Heidrich
%A Aditya Rastogi
%A Priyank Upadhya
%A Gianluca Brugnara
%A Martha Foltyn-Dumitru
%A Benedikt Wiestler
%A Philipp Vollmuth
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-heidrich26a
%I PMLR
%P 615--638
%U https://proceedings.mlr.press/v301/heidrich26a.html
%V 301
%X Pathology detection in medical imaging is crucial for radiologists, yet current approaches that train specialized models for each region of interest often lack efficiency and robustness. Furthermore, the scarcity of annotated medical data, particularly for diverse phenotypes, poses significant challenges in achieving generalizability. To address these challenges, we present a novel language-guided object detection pipeline that leverages curriculum learning strategies, chosen for their ability to progressively train models on increasingly complex samples, thereby improving generalization across pathologies, phenotypes, and modalities. We developed a unified pipeline to convert segmentation datasets into bounding box annotations, and applied two curriculum learning approaches - teacher curriculum and bounding box size curriculum - to train a Grounding DINO model. Our method was evaluated on different tumor types in MRI and CT scans and showed significant improvements in detection accuracy. The teacher and bounding box size curriculum learning approaches yielded a 4.9% AP and 5.2% AP increase over baseline, respectively. The results highlight the potential of curriculum learning to optimize medical image analysis and clinical workflow. The code is available at https://github.com/CCI-Bonn/CL4OD.

APA

Heidrich, L.A., Rastogi, A., Upadhya, P., Brugnara, G., Foltyn-Dumitru, M., Wiestler, B. & Vollmuth, P.. (2026). Curriculum Learning for Language-guided, Multi-modal Detection of Various Pathologies. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:615-638 Available from https://proceedings.mlr.press/v301/heidrich26a.html.

Related Material

Download PDF