Comparing Computational Pathology Foundation Models using Representational Similarity Analysis

Vaibhav Mishra, William Lotter
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1007-1022, 2026.

Abstract

Foundation models are increasingly developed in computational pathology ({CPath}) given their promise in facilitating many downstream tasks. While recent studies have evaluated task performance across models, less is known about the structure and variability of their learned representations. Here, we systematically analyze the representational spaces of six {CPath} foundation models using techniques popularized in computational neuroscience. The models analyzed span vision-language contrastive learning ({CONCH}, {PLIP}, {KEEP}) and self-distillation ({UNI} (v2), Virchow (v2), Prov-GigaPath) approaches. Through representational similarity analysis using H&E image patches from {TCGA}, we find that {UNI2} and Virchow2 have the most distinct representational structures, whereas Prov-Gigapath has the highest average similarity across models. Having the same training paradigm (vision-only vs. vision-language) did not guarantee higher representational similarity. The representations of all models showed a high slide-dependence, but relatively low disease-dependence. Stain normalization decreased slide-dependence for all models by a range of 5.5% ({CONCH}) to 20.5% ({PLIP}). In terms of intrinsic dimensionality, vision-language models demonstrated relatively compact representations, compared to the more distributed representations of vision-only models. These findings highlight opportunities to improve robustness to slide-specific features, inform model ensembling strategies, and provide insights into how training paradigms shape model representations. Our framework is extendable across medical imaging domains, where probing the internal representations of foundation models can support their effective development and deployment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-mishra26a, title = {Comparing Computational Pathology Foundation Models using Representational Similarity Analysis}, author = {Mishra, Vaibhav and Lotter, William}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {1007--1022}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/mishra26a/mishra26a.pdf}, url = {https://proceedings.mlr.press/v297/mishra26a.html}, abstract = {Foundation models are increasingly developed in computational pathology ({CPath}) given their promise in facilitating many downstream tasks. While recent studies have evaluated task performance across models, less is known about the structure and variability of their learned representations. Here, we systematically analyze the representational spaces of six {CPath} foundation models using techniques popularized in computational neuroscience. The models analyzed span vision-language contrastive learning ({CONCH}, {PLIP}, {KEEP}) and self-distillation ({UNI} (v2), Virchow (v2), Prov-GigaPath) approaches. Through representational similarity analysis using H&E image patches from {TCGA}, we find that {UNI2} and Virchow2 have the most distinct representational structures, whereas Prov-Gigapath has the highest average similarity across models. Having the same training paradigm (vision-only vs. vision-language) did not guarantee higher representational similarity. The representations of all models showed a high slide-dependence, but relatively low disease-dependence. Stain normalization decreased slide-dependence for all models by a range of 5.5% ({CONCH}) to 20.5% ({PLIP}). In terms of intrinsic dimensionality, vision-language models demonstrated relatively compact representations, compared to the more distributed representations of vision-only models. These findings highlight opportunities to improve robustness to slide-specific features, inform model ensembling strategies, and provide insights into how training paradigms shape model representations. Our framework is extendable across medical imaging domains, where probing the internal representations of foundation models can support their effective development and deployment.} }
Endnote
%0 Conference Paper %T Comparing Computational Pathology Foundation Models using Representational Similarity Analysis %A Vaibhav Mishra %A William Lotter %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-mishra26a %I PMLR %P 1007--1022 %U https://proceedings.mlr.press/v297/mishra26a.html %V 297 %X Foundation models are increasingly developed in computational pathology ({CPath}) given their promise in facilitating many downstream tasks. While recent studies have evaluated task performance across models, less is known about the structure and variability of their learned representations. Here, we systematically analyze the representational spaces of six {CPath} foundation models using techniques popularized in computational neuroscience. The models analyzed span vision-language contrastive learning ({CONCH}, {PLIP}, {KEEP}) and self-distillation ({UNI} (v2), Virchow (v2), Prov-GigaPath) approaches. Through representational similarity analysis using H&E image patches from {TCGA}, we find that {UNI2} and Virchow2 have the most distinct representational structures, whereas Prov-Gigapath has the highest average similarity across models. Having the same training paradigm (vision-only vs. vision-language) did not guarantee higher representational similarity. The representations of all models showed a high slide-dependence, but relatively low disease-dependence. Stain normalization decreased slide-dependence for all models by a range of 5.5% ({CONCH}) to 20.5% ({PLIP}). In terms of intrinsic dimensionality, vision-language models demonstrated relatively compact representations, compared to the more distributed representations of vision-only models. These findings highlight opportunities to improve robustness to slide-specific features, inform model ensembling strategies, and provide insights into how training paradigms shape model representations. Our framework is extendable across medical imaging domains, where probing the internal representations of foundation models can support their effective development and deployment.
APA
Mishra, V. & Lotter, W.. (2026). Comparing Computational Pathology Foundation Models using Representational Similarity Analysis. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:1007-1022 Available from https://proceedings.mlr.press/v297/mishra26a.html.

Related Material