[edit]
Uncertainty Estimation in Large Vision Language Models for Automated Radiology Report Generation
Proceedings of the 4th Machine Learning for Health Symposium, PMLR 259:1039-1052, 2025.
Abstract
The automated generation of free-text radiology reports is crucial for improving diagnosis and treatment in clinical practice. The latest chest X-ray report generation models utilize large vision language model (LVLM) architectures, which demand a higher level of interpretability for clinical deployment. Uncertainty estimation scores can assist clinicians in evaluating the reliability of these model outputs and promoting broader adoption of automated systems. In this paper, we conduct a comprehensive evaluation of the correlation between 16 LLM uncertainty scores and 6 radiology report evaluation metrics across 4 state-of-the-art LVLMs for CXR report generation. Our findings show a strong Pearson correlation, ranging from 0.4 to 0.6 on a scale from -1 to 1, for several models. We provide a detailed analysis of these uncertainty scores and evaluation metrics, offering insights in applying these methods in real clinical settings. This study is the first to evaluate LLM-based uncertainty estimation scores for X-ray report generation LVLM models, establishing a benchmark and laying the groundwork for their adoption in clinical practice.