[edit]
Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:629-660, 2026.
Abstract
The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-VisionCXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation, fine-grained visual question answering ({VQA}) for abnormalities and tubes/lines (including presence and placement), and grounding capabilities for anatomies, pathologies, and medical devices. Beyond these tasks, we propose a capability for findings generation with causal exploration of radiology findings and differential diagnosis, enabling the model to affirm findings or rule out conditions, thereby enhancing its utility in clinical decision-making. Rad-Phi4-VisionCXR achieves state-of-the-art performance on the ReXrank benchmark for report generation, {VQA}, and grounding. Its compact architecture provides a scalable, high-performance solution for {AI}-driven radiology.