[edit]
Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:428-446, 2026.
Abstract
Foundation models (FMs) have shown impressive performance in medical image analysis tasks, but their deployment in real-world clinical settings, especially across diverse patient populations such as adult and pediatric cases, remains challenging. Key open questions include optimal prompting techniques and strategies for model adaptation or fine-tuning for clinical use. In this study, we evaluated different approaches for deploying FMs in clinical scenarios for diverse patient populations. We use the lightweight, embedding-based vision-language FM $\textit{MedImageInsight}$ to predict pneumonia from chest X-rays, a condition common in both adult and pediatric patients.We observed a large variation in model predictive performance depending on the chosen prompt design, highlighting the importance of text prompt design for successful zero-shot (ZS) application. On in-domain datasets, we found performance differences of up to 46% in Matthews correlation coefficient (MCC) and 56% in true positive rates across different text prompts.By introducing text and vision embedding ensembles, we achieved substantial ZS improvements, outperforming training-based methods (fine-tuning, Linear Probe) in low-data scenarios by up to 43% for adults and 35% for pediatric populations (MCC). This ensembling strategy also promotes resource-efficient, equitable clinical use by supporting diverse demographic subgroups, achieving MCC improvements of 6% by sex, 17% by age, and 10% by race compared to linear probe.