RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance

Chantal Pellegrini; Ege Özsoy; Benjamin Busam; Benedikt Wiestler; Nassir Navab; Matthias Keicher

RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance

Chantal Pellegrini, Ege Özsoy, Benjamin Busam, Benedikt Wiestler, Nassir Navab, Matthias Keicher

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:1294-1312, 2026.

Abstract

Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method. Our model and dataset are publicly available (https://github.com/ChantalMP/RaDialog and https://physionet.org/content/radialog-instruct-dataset/1.1.0/).

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-pellegrini26a,
  title = 	 {RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance},
  author =       {Pellegrini, Chantal and \"Ozsoy, Ege and Busam, Benjamin and Wiestler, Benedikt and Navab, Nassir and Keicher, Matthias},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1294--1312},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/pellegrini26a/pellegrini26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/pellegrini26a.html},
  abstract = 	 {Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method. Our model and dataset are publicly available (https://github.com/ChantalMP/RaDialog and https://physionet.org/content/radialog-instruct-dataset/1.1.0/).}
}

Endnote

%0 Conference Paper
%T RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance
%A Chantal Pellegrini
%A Ege Özsoy
%A Benjamin Busam
%A Benedikt Wiestler
%A Nassir Navab
%A Matthias Keicher
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-pellegrini26a
%I PMLR
%P 1294--1312
%U https://proceedings.mlr.press/v301/pellegrini26a.html
%V 301
%X Conversational AI tools for generating and discussing accurate radiology reports could transform radiology by enabling collaborative, human-in-the-loop diagnostic processes, saving time and enhancing report quality. While, to this end, Large Vision-Language Models hold promise, current methods lack clinical correctness or are single-task models without conversational abilities. We propose a novel architecture and dataset to address these limitations. First, we propose a secondary image branch, explicitly focusing on structured clinical findings, improving the clinical correctness score by 13.3%. Second, we propose a catastrophic forgetting mitigation strategy and instruct dataset with variable dialog-based tasks, to enable our model to handle a multitude of different queries. RaDialog marks a foundational step toward clinical dialog systems, outperforming existing medical LVLMs by 15.0% in clinical correctness in report generation, 23.4% in interactive report correction, and is preferred by radiologists in 84.0% of cases over a comparative method. Our model and dataset are publicly available (https://github.com/ChantalMP/RaDialog and https://physionet.org/content/radialog-instruct-dataset/1.1.0/).

APA

Pellegrini, C., Özsoy, E., Busam, B., Wiestler, B., Navab, N. & Keicher, M.. (2026). RaDialog: Large Vision-Language Models for X-Ray Reporting and Dialog-Driven Assistance. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:1294-1312 Available from https://proceedings.mlr.press/v301/pellegrini26a.html.

Related Material

Download PDF