Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows

Mercy Prasanna Ranjit; Anirban Porya; Shaury Srivastav; Niharika Vadlamudi; Nikhilesh Chowdary Eathamukkala; Shashank Udyavar; Rahul Kumar; Tanuja Ganu

Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows

Mercy Prasanna Ranjit, Anirban Porya, Shaury Srivastav, Niharika Vadlamudi, Nikhilesh Chowdary Eathamukkala, Shashank Udyavar, Rahul Kumar, Tanuja Ganu

Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:629-660, 2026.

Abstract

The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-VisionCXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering ({VQA}) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-VisionCXR achieves state-of-the-art performance on the ReXrank benchmark for report generation, {VQA}, and grounding. Its compact architecture provides a scalable, high-performance solution for {AI}-driven radiology.

Cite this Paper

BibTeX

@InProceedings{pmlr-v297-ranjit26a,
  title = 	 {Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows},
  author =       {Ranjit, Mercy Prasanna and Porya, Anirban and Srivastav, Shaury and Vadlamudi, Niharika and Eathamukkala, Nikhilesh Chowdary and Udyavar, Shashank and Kumar, Rahul and Ganu, Tanuja},
  booktitle = 	 {Proceedings of the Fifth Machine Learning for Health Symposium},
  pages = 	 {629--660},
  year = 	 {2026},
  editor = 	 {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush},
  volume = 	 {297},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v297/main/assets/ranjit26a/ranjit26a.pdf},
  url = 	 {https://proceedings.mlr.press/v297/ranjit26a.html},
  abstract = 	 {The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-VisionCXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering ({VQA}) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-VisionCXR achieves state-of-the-art performance on the ReXrank benchmark for report generation, {VQA}, and grounding. Its compact architecture provides a scalable, high-performance solution for {AI}-driven radiology.}
}

Endnote

%0 Conference Paper
%T Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows
%A Mercy Prasanna Ranjit
%A Anirban Porya
%A Shaury Srivastav
%A Niharika Vadlamudi
%A Nikhilesh Chowdary Eathamukkala
%A Shashank Udyavar
%A Rahul Kumar
%A Tanuja Ganu
%B Proceedings of the Fifth Machine Learning for Health Symposium
%C Proceedings of Machine Learning Research
%D 2026
%E Peniel Argaw
%E Haoran Zhang
%E Sarah Jabbour
%E Payal Chandak
%E Jerry Ji
%E Sumit Mukherjee
%E Olawale Salaudeen
%E Trenton Chang
%E Elizabeth Healey
%E Fabian Gröger
%E Amin Adibi
%E Stefan Hegselmann
%E Benjamin Wild
%E Ayush Noori	
%F pmlr-v297-ranjit26a
%I PMLR
%P 629--660
%U https://proceedings.mlr.press/v297/ranjit26a.html
%V 297
%X The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-VisionCXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering ({VQA}) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-VisionCXR achieves state-of-the-art performance on the ReXrank benchmark for report generation, {VQA}, and grounding. Its compact architecture provides a scalable, high-performance solution for {AI}-driven radiology.

APA

Ranjit, M.P., Porya, A., Srivastav, S., Vadlamudi, N., Eathamukkala, N.C., Udyavar, S., Kumar, R. & Ganu, T.. (2026). Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:629-660 Available from https://proceedings.mlr.press/v297/ranjit26a.html.

Related Material

Download PDF