Unpaired Multimodal Learning for Biological Datasets

Zongliang Ji, Cian Eastwood, Anna Goldenberg, Paul Pu Liang, Jason Hartford, Rahul G. Krishnan, Emmanuel Noutahi
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1840-1868, 2026.

Abstract

Multimodal learning holds tremendous promise for biology, providing a path to integrate diverse data types and ultimately construct a more complete picture of underlying biological mechanisms. However, most existing approaches for multimodal learning require paired samples—an impractical assumption in biology, where measurement devices often destroy samples (e.g., RNA sequencing). To address this challenge, we introduce IntraPair InterCluster (IPIC), a novel contrastive approach for multimodal learning that departs from traditional reliance on paired data by requiring only treatment-group labels. IPIC aligns modalities through intra-treatment group matching and inter-treatment group clustering, producing embeddings that are both accurate and biologically meaningful. In experiments on four curated multimodal biological datasets, IPIC consistently outperforms baseline approaches, highlighting its effectiveness in leveraging independently collected single-modality datasets for multimodal contrastive pre-training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-ji26a, title = {Unpaired Multimodal Learning for Biological Datasets}, author = {Ji, Zongliang and Eastwood, Cian and Goldenberg, Anna and Liang, Paul Pu and Hartford, Jason and Krishnan, Rahul G. and Noutahi, Emmanuel}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {1840--1868}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/ji26a/ji26a.pdf}, url = {https://proceedings.mlr.press/v315/ji26a.html}, abstract = {Multimodal learning holds tremendous promise for biology, providing a path to integrate diverse data types and ultimately construct a more complete picture of underlying biological mechanisms. However, most existing approaches for multimodal learning require paired samples—an impractical assumption in biology, where measurement devices often destroy samples (e.g., RNA sequencing). To address this challenge, we introduce IntraPair InterCluster (IPIC), a novel contrastive approach for multimodal learning that departs from traditional reliance on paired data by requiring only treatment-group labels. IPIC aligns modalities through intra-treatment group matching and inter-treatment group clustering, producing embeddings that are both accurate and biologically meaningful. In experiments on four curated multimodal biological datasets, IPIC consistently outperforms baseline approaches, highlighting its effectiveness in leveraging independently collected single-modality datasets for multimodal contrastive pre-training.} }
Endnote
%0 Conference Paper %T Unpaired Multimodal Learning for Biological Datasets %A Zongliang Ji %A Cian Eastwood %A Anna Goldenberg %A Paul Pu Liang %A Jason Hartford %A Rahul G. Krishnan %A Emmanuel Noutahi %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-ji26a %I PMLR %P 1840--1868 %U https://proceedings.mlr.press/v315/ji26a.html %V 315 %X Multimodal learning holds tremendous promise for biology, providing a path to integrate diverse data types and ultimately construct a more complete picture of underlying biological mechanisms. However, most existing approaches for multimodal learning require paired samples—an impractical assumption in biology, where measurement devices often destroy samples (e.g., RNA sequencing). To address this challenge, we introduce IntraPair InterCluster (IPIC), a novel contrastive approach for multimodal learning that departs from traditional reliance on paired data by requiring only treatment-group labels. IPIC aligns modalities through intra-treatment group matching and inter-treatment group clustering, producing embeddings that are both accurate and biologically meaningful. In experiments on four curated multimodal biological datasets, IPIC consistently outperforms baseline approaches, highlighting its effectiveness in leveraging independently collected single-modality datasets for multimodal contrastive pre-training.
APA
Ji, Z., Eastwood, C., Goldenberg, A., Liang, P.P., Hartford, J., Krishnan, R.G. & Noutahi, E.. (2026). Unpaired Multimodal Learning for Biological Datasets. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:1840-1868 Available from https://proceedings.mlr.press/v315/ji26a.html.

Related Material