[edit]
Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:424-448, 2026.
Abstract
Vision-language models ({VLM}s) often produce chain-of-thought ({CoT}) explanations that sound plausible yet fail to reflect the underlying decision process, undermining trust in high-stakes clinical use. Existing evaluations rarely catch this misalignment, prioritizing answer accuracy or adherence to formats. We present a clinically grounded framework for chest X-ray visual question answering ({VQA}) that probes {CoT} faithfulness via controlled text and image modifications across three axes: clinical fidelity, causal attribution, and confidence calibration. In a reader study (n=4), evaluator-radiologist correlations fall within the observed inter-radiologist range for all axes, with strong alignment for attribution (Kendall’s tau-b = 0.670), moderate alignment for fidelity (tau-b = 0.387), and weak alignment for confidence tone (tau-b = 0.091), which we report with caution. Benchmarking six {VLM}s shows that answer accuracy and explanation quality can be decoupled, acknowledging injected cues does not ensure grounding, and text cues shift explanations more than visual cues. While some open-source models match final answer accuracy, proprietary models score higher on attribution (25.0% vs. 1.4%) and often on fidelity (36.1% vs. 31.7%), highlighting deployment risks and the need to evaluate beyond final answer accuracy.