Comparing Foundation Models for Medical Images: A Study on Limited Data and Generalization

Ingrid Utseth, Amund Hansen Vedal, Sarina Thomas, Line Eikvil
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307:439-447, 2026.

Abstract

In this study we have investigated how vision foundation models, pretrained on different domains, compete with a specialized model for classification as a function of the size of the labeled training set of medical images. Furthermore, we have looked into the different models’ ability to generalize to difficult cases. Our experiments are conducted for cardiac ultrasound images and the downstream task of view recognition. Still, this classification task is meant to serve as a demonstrative example, where we think that the findings should be transferable to other classification tasks and other domains. Through these experiments we found that the foundation models were able to beat the performance of our task-specific supervised model when labelled training data were limited. This was true even for models trained on natural images and when using the simple linear probing method to create a classifier. We observed that more domain-specific foundation models achieved an even higher performance with limited data. On the other hand, the more general models showed a greater ability to generalize and perform well on difficult, out-of-distribution cases. Still, for typical in-domain cases with sufficient labeled data, a task-specific ResNet model was competitive with the foundation models, while also being both smaller and faster.

Cite this Paper


BibTeX
@InProceedings{pmlr-v307-utseth26a, title = {Comparing Foundation Models for Medical Images: A Study on Limited Data and Generalization}, author = {Utseth, Ingrid and Vedal, Amund Hansen and Thomas, Sarina and Eikvil, Line}, booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)}, pages = {439--447}, year = {2026}, editor = {Kim, Hyeongji and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {307}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v307/main/assets/utseth26a/utseth26a.pdf}, url = {https://proceedings.mlr.press/v307/utseth26a.html}, abstract = {In this study we have investigated how vision foundation models, pretrained on different domains, compete with a specialized model for classification as a function of the size of the labeled training set of medical images. Furthermore, we have looked into the different models’ ability to generalize to difficult cases. Our experiments are conducted for cardiac ultrasound images and the downstream task of view recognition. Still, this classification task is meant to serve as a demonstrative example, where we think that the findings should be transferable to other classification tasks and other domains. Through these experiments we found that the foundation models were able to beat the performance of our task-specific supervised model when labelled training data were limited. This was true even for models trained on natural images and when using the simple linear probing method to create a classifier. We observed that more domain-specific foundation models achieved an even higher performance with limited data. On the other hand, the more general models showed a greater ability to generalize and perform well on difficult, out-of-distribution cases. Still, for typical in-domain cases with sufficient labeled data, a task-specific ResNet model was competitive with the foundation models, while also being both smaller and faster.} }
Endnote
%0 Conference Paper %T Comparing Foundation Models for Medical Images: A Study on Limited Data and Generalization %A Ingrid Utseth %A Amund Hansen Vedal %A Sarina Thomas %A Line Eikvil %B Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2026 %E Hyeongji Kim %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v307-utseth26a %I PMLR %P 439--447 %U https://proceedings.mlr.press/v307/utseth26a.html %V 307 %X In this study we have investigated how vision foundation models, pretrained on different domains, compete with a specialized model for classification as a function of the size of the labeled training set of medical images. Furthermore, we have looked into the different models’ ability to generalize to difficult cases. Our experiments are conducted for cardiac ultrasound images and the downstream task of view recognition. Still, this classification task is meant to serve as a demonstrative example, where we think that the findings should be transferable to other classification tasks and other domains. Through these experiments we found that the foundation models were able to beat the performance of our task-specific supervised model when labelled training data were limited. This was true even for models trained on natural images and when using the simple linear probing method to create a classifier. We observed that more domain-specific foundation models achieved an even higher performance with limited data. On the other hand, the more general models showed a greater ability to generalize and perform well on difficult, out-of-distribution cases. Still, for typical in-domain cases with sufficient labeled data, a task-specific ResNet model was competitive with the foundation models, while also being both smaller and faster.
APA
Utseth, I., Vedal, A.H., Thomas, S. & Eikvil, L.. (2026). Comparing Foundation Models for Medical Images: A Study on Limited Data and Generalization. Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 307:439-447 Available from https://proceedings.mlr.press/v307/utseth26a.html.

Related Material