Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Steven Song, Morgan Borjigin-Wang, Irene R. Madejski, Robert L. Grossman
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:202-227, 2026.

Abstract

The Cancer Genome Atlas ({TCGA}) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over {TCGA} for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models ({FM}s) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of {FM}s. While {TCGA} contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot {FM} embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we propose an embedding-centric approach to multimodal cancer modeling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-song26a, title = {Multimodal Cancer Modeling in the Age of Foundation Model Embeddings}, author = {Song, Steven and Borjigin-Wang, Morgan and Madejski, Irene R. and Grossman, Robert L.}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {202--227}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/song26a/song26a.pdf}, url = {https://proceedings.mlr.press/v297/song26a.html}, abstract = {The Cancer Genome Atlas ({TCGA}) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over {TCGA} for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models ({FM}s) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of {FM}s. While {TCGA} contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot {FM} embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we propose an embedding-centric approach to multimodal cancer modeling.} }
Endnote
%0 Conference Paper %T Multimodal Cancer Modeling in the Age of Foundation Model Embeddings %A Steven Song %A Morgan Borjigin-Wang %A Irene R. Madejski %A Robert L. Grossman %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-song26a %I PMLR %P 202--227 %U https://proceedings.mlr.press/v297/song26a.html %V 297 %X The Cancer Genome Atlas ({TCGA}) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over {TCGA} for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models ({FM}s) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of {FM}s. While {TCGA} contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot {FM} embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we propose an embedding-centric approach to multimodal cancer modeling.
APA
Song, S., Borjigin-Wang, M., Madejski, I.R. & Grossman, R.L.. (2026). Multimodal Cancer Modeling in the Age of Foundation Model Embeddings. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:202-227 Available from https://proceedings.mlr.press/v297/song26a.html.

Related Material