Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations

Louisa Fay; Jean-Benoit Delbrouck; Thomas Küstner; Bin Yang; Noel C Codella; Matthew P. Lungren; Curtis Langlotz; Sergios Gatidis

Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations

Louisa Fay, Jean-Benoit Delbrouck, Thomas Küstner, Bin Yang, Noel C Codella, Matthew P. Lungren, Curtis Langlotz, Sergios Gatidis

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:428-446, 2026.

Abstract

Foundation models (FMs) have shown impressive performance in medical image analysis tasks, but their deployment in real-world clinical settings, especially across diverse patient populations such as adult and pediatric cases, remains challenging. Key open questions include optimal prompting techniques and strategies for model adaptation or fine-tuning for clinical use. In this study, we evaluated different approaches for deploying FMs in clinical scenarios for diverse patient populations. We use the lightweight, embedding-based vision-language FM $\textit{MedImageInsight}$ to predict pneumonia from chest X-rays, a condition common in both adult and pediatric patients.We observed a large variation in model predictive performance depending on the chosen prompt design, highlighting the importance of text prompt design for successful zero-shot (ZS) application. On in-domain datasets, we found performance differences of up to 46% in Matthews correlation coefficient (MCC) and 56% in true positive rates across different text prompts.By introducing text and vision embedding ensembles, we achieved substantial ZS improvements, outperforming training-based methods (fine-tuning, Linear Probe) in low-data scenarios by up to 43% for adults and 35% for pediatric populations (MCC). This ensembling strategy also promotes resource-efficient, equitable clinical use by supporting diverse demographic subgroups, achieving MCC improvements of 6% by sex, 17% by age, and 10% by race compared to linear probe.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-fay26a,
  title = 	 {Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations},
  author =       {Fay, Louisa and Delbrouck, Jean-Benoit and K\"ustner, Thomas and Yang, Bin and Codella, Noel C and Lungren, Matthew P. and Langlotz, Curtis and Gatidis, Sergios},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {428--446},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/fay26a/fay26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/fay26a.html},
  abstract = 	 {Foundation models (FMs) have shown impressive performance in medical image analysis tasks, but their deployment in real-world clinical settings, especially across diverse patient populations such as adult and pediatric cases, remains challenging. Key open questions include optimal prompting techniques and strategies for model adaptation or fine-tuning for clinical use. In this study, we evaluated different approaches for deploying FMs in clinical scenarios for diverse patient populations. We use the lightweight, embedding-based vision-language FM $\textit{MedImageInsight}$ to predict pneumonia from chest X-rays, a condition common in both adult and pediatric patients.We observed a large variation in model predictive performance depending on the chosen prompt design, highlighting the importance of text prompt design for successful zero-shot (ZS) application. On in-domain datasets, we found performance differences of up to 46% in Matthews correlation coefficient (MCC) and 56% in true positive rates across different text prompts.By introducing text and vision embedding ensembles, we achieved substantial ZS improvements, outperforming training-based methods (fine-tuning, Linear Probe) in low-data scenarios by up to 43% for adults and 35% for pediatric populations (MCC). This ensembling strategy also promotes resource-efficient, equitable clinical use by supporting diverse demographic subgroups, achieving MCC improvements of 6% by sex, 17% by age, and 10% by race compared to linear probe.}
}

Endnote

%0 Conference Paper
%T Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations
%A Louisa Fay
%A Jean-Benoit Delbrouck
%A Thomas Küstner
%A Bin Yang
%A Noel C Codella
%A Matthew P. Lungren
%A Curtis Langlotz
%A Sergios Gatidis
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-fay26a
%I PMLR
%P 428--446
%U https://proceedings.mlr.press/v301/fay26a.html
%V 301
%X Foundation models (FMs) have shown impressive performance in medical image analysis tasks, but their deployment in real-world clinical settings, especially across diverse patient populations such as adult and pediatric cases, remains challenging. Key open questions include optimal prompting techniques and strategies for model adaptation or fine-tuning for clinical use. In this study, we evaluated different approaches for deploying FMs in clinical scenarios for diverse patient populations. We use the lightweight, embedding-based vision-language FM $\textit{MedImageInsight}$ to predict pneumonia from chest X-rays, a condition common in both adult and pediatric patients.We observed a large variation in model predictive performance depending on the chosen prompt design, highlighting the importance of text prompt design for successful zero-shot (ZS) application. On in-domain datasets, we found performance differences of up to 46% in Matthews correlation coefficient (MCC) and 56% in true positive rates across different text prompts.By introducing text and vision embedding ensembles, we achieved substantial ZS improvements, outperforming training-based methods (fine-tuning, Linear Probe) in low-data scenarios by up to 43% for adults and 35% for pediatric populations (MCC). This ensembling strategy also promotes resource-efficient, equitable clinical use by supporting diverse demographic subgroups, achieving MCC improvements of 6% by sex, 17% by age, and 10% by race compared to linear probe.

APA

Fay, L., Delbrouck, J., Küstner, T., Yang, B., Codella, N.C., Lungren, M.P., Langlotz, C. & Gatidis, S.. (2026). Beyond the Prompt: Deploying Medical Foundation Models on Diverse Chest X-ray Populations. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:428-446 Available from https://proceedings.mlr.press/v301/fay26a.html.

Related Material

Download PDF