GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images

Ruby Wood, Yang Hu, Jens Rittscher, Bin Li
Proceedings of the MICCAI Workshop on Computational Pathology, PMLR 316:52-65, 2026.

Abstract

Spatial transcriptomics is used to identify gene expression levels in certain locations across a tissue sample, preserving important spatial information in cancerous tissue samples for downstream clinical decision making. However, this technology is currently too expensive to be used in a routine clinical pathways. On the other hand, digital images of haematoxylin and eosin stained histology slides are routinely generated from tissue biopsy samples. Here, we develop a generative cross-modal method to predict spatial transcriptomics from histology images by aligning the latent space of two VQ-VAEs for each modality. We benchmark our approach on multiple sequencing technologies (Visium and ST) and cancer types (breast, brain, spinal cord and skin) from two public datasets, using 142 slides with 820,407 spots from STImage-1K4M (Chen et al., 2024a) and 568 slides with 254,812 spots from HEST-1k (Jaume et al., 2024). Across the resulting cohorts, our model achieves superior performance to state-of-the-art models in half, whilst providing an interpretable framework for understanding which genetic expressions of a cancer tumour can be captured from the morphology observed in corresponding locations of the histology image.

Cite this Paper


BibTeX
@InProceedings{pmlr-v316-wood26a, title = {GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images}, author = {Wood, Ruby and Hu, Yang and Rittscher, Jens and Li, Bin}, booktitle = {Proceedings of the MICCAI Workshop on Computational Pathology}, pages = {52--65}, year = {2026}, editor = {Studer, Linda and Ciompi, Francesco and Khalili, Nadieh and Faryna, Khrystyna and Faryna, Khrystyna and Yeong, Joe and Lau, Mai Chan and Chen, Hao and Liu, Ziyi and Brattoli, Biagio}, volume = {316}, series = {Proceedings of Machine Learning Research}, month = {27 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v316/main/assets/wood26a/wood26a.pdf}, url = {https://proceedings.mlr.press/v316/wood26a.html}, abstract = {Spatial transcriptomics is used to identify gene expression levels in certain locations across a tissue sample, preserving important spatial information in cancerous tissue samples for downstream clinical decision making. However, this technology is currently too expensive to be used in a routine clinical pathways. On the other hand, digital images of haematoxylin and eosin stained histology slides are routinely generated from tissue biopsy samples. Here, we develop a generative cross-modal method to predict spatial transcriptomics from histology images by aligning the latent space of two VQ-VAEs for each modality. We benchmark our approach on multiple sequencing technologies (Visium and ST) and cancer types (breast, brain, spinal cord and skin) from two public datasets, using 142 slides with 820,407 spots from STImage-1K4M (Chen et al., 2024a) and 568 slides with 254,812 spots from HEST-1k (Jaume et al., 2024). Across the resulting cohorts, our model achieves superior performance to state-of-the-art models in half, whilst providing an interpretable framework for understanding which genetic expressions of a cancer tumour can be captured from the morphology observed in corresponding locations of the histology image.} }
Endnote
%0 Conference Paper %T GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images %A Ruby Wood %A Yang Hu %A Jens Rittscher %A Bin Li %B Proceedings of the MICCAI Workshop on Computational Pathology %C Proceedings of Machine Learning Research %D 2026 %E Linda Studer %E Francesco Ciompi %E Nadieh Khalili %E Khrystyna Faryna %E Khrystyna Faryna %E Joe Yeong %E Mai Chan Lau %E Hao Chen %E Ziyi Liu %E Biagio Brattoli %F pmlr-v316-wood26a %I PMLR %P 52--65 %U https://proceedings.mlr.press/v316/wood26a.html %V 316 %X Spatial transcriptomics is used to identify gene expression levels in certain locations across a tissue sample, preserving important spatial information in cancerous tissue samples for downstream clinical decision making. However, this technology is currently too expensive to be used in a routine clinical pathways. On the other hand, digital images of haematoxylin and eosin stained histology slides are routinely generated from tissue biopsy samples. Here, we develop a generative cross-modal method to predict spatial transcriptomics from histology images by aligning the latent space of two VQ-VAEs for each modality. We benchmark our approach on multiple sequencing technologies (Visium and ST) and cancer types (breast, brain, spinal cord and skin) from two public datasets, using 142 slides with 820,407 spots from STImage-1K4M (Chen et al., 2024a) and 568 slides with 254,812 spots from HEST-1k (Jaume et al., 2024). Across the resulting cohorts, our model achieves superior performance to state-of-the-art models in half, whilst providing an interpretable framework for understanding which genetic expressions of a cancer tumour can be captured from the morphology observed in corresponding locations of the histology image.
APA
Wood, R., Hu, Y., Rittscher, J. & Li, B.. (2026). GenST: A Generative Cross-Modal Model for Predicting Spatial Transcriptomics from Histology Images. Proceedings of the MICCAI Workshop on Computational Pathology, in Proceedings of Machine Learning Research 316:52-65 Available from https://proceedings.mlr.press/v316/wood26a.html.

Related Material