X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT

Nusrat Binta Nizam; Fengbei Liu; Sunwoo Kwak; Ilan Richter; Jayant K Raikhelkar; Ashley Beecy; Nir Uriel; Deborah Estrin; Mert R Sabuncu

X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT

Nusrat Binta Nizam, Fengbei Liu, Sunwoo Kwak, Ilan Richter, Jayant K Raikhelkar, Ashley Beecy, Nir Uriel, Deborah Estrin, Mert R Sabuncu

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1740-1767, 2026.

Abstract

comment Deep learning models for cardiac prognostics often operate within single-modality frameworks, limiting their ability to capture physiologically meaningful cross-modal relationships. In particular, we focus on non-gated, non-contrast chest computed tomography (CT) scans that are typically acquired for entirely non-cardiac indications, rather than for dedicated cardiac assessment. We introduce X-Cardia, a phenotype-guided multimodal alignment framework that transfers structural cardiac phenotypes from echocardiography (ECHO) and electrocardiography (ECG) into CT representations by enforcing explicit phenotype-level consistency. This setting is intrinsically challenging because the lack of cardiac gating obfuscates the cardiac phase and the absence of contrast limits the visibility of cardiovascular structures, but these scans represent a rich resource for opportunistic cardiac screening. The approach combines CLIP-style contrastive pre-training to align image and tabular embeddings with a non-parametric Nadaraya–Watson phenotype head, which uses a support-bank to guide the latent space toward clinically meaningful axes. This enables the image encoder to learn physiological features that are generalized beyond the modality boundaries. We pre-train using data from 20,574 patients and fine-tune the resulting image encoder on ten cardiac abnormality prediction tasks. The proposed method consistently outperforms both the standard contrastive learning and the baseline without pre-training, achieving a gain of up to 8% of AUROC on the test set. In the 5-shot setting, phenotype-guided alignment improves AUROC by an average of 9.8% over baselines, demonstrating strong data efficiency and generalization from few labeled samples. Our results show that explicit phenotype-guided alignment yields interpretable, data-efficient representations that transfer cardiac knowledge to non-cardiac CTs, defining a promising paradigm for multimodal medical imaging. comment Multimodal medical data offer an opportunity to learn general-purpose representations for cardiovascular diagnosis. We introduce X-Cardia, a cardiac phenotype-guided multimodal framework that uses structured data as intermediate supervision during pre-training. X-Cardia learns to extract cardiac information from non-contrast, non-gated chest CT scans by aligning CT features with tabular measurements derived from echocardiography (ECHO) and electrocardiography (ECG). Our method combines CLIP-style contrastive pre-training with a non-parametric Nadaraya–Watson (NW) prediction head that enforces phenotype-level similarity via exemplar-based alignment. Pre-training on 20,574 patients, followed by fine-tuning on ten cardiac abnormality prediction tasks, yields substantial performance gains. X-Cardia improves AUROC by up to 8% on the held-out test set and delivers an average 11.8% AUROC improvement in a 5-shot regime. These results demonstrate that explicit phenotype alignment produces interpretable, data-efficient representations and enables routine chest CT to support opportunistic cardiac screening.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-nizam26a,
  title = 	 {X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT},
  author =       {Nizam, Nusrat Binta and Liu, Fengbei and Kwak, Sunwoo and Richter, Ilan and Raikhelkar, Jayant K and Beecy, Ashley and Uriel, Nir and Estrin, Deborah and Sabuncu, Mert R},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1740--1767},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/nizam26a/nizam26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/nizam26a.html},
  abstract = 	 {comment Deep learning models for cardiac prognostics often operate within single-modality frameworks, limiting their ability to capture physiologically meaningful cross-modal relationships. In particular, we focus on non-gated, non-contrast chest computed tomography (CT) scans that are typically acquired for entirely non-cardiac indications, rather than for dedicated cardiac assessment. We introduce X-Cardia, a phenotype-guided multimodal alignment framework that transfers structural cardiac phenotypes from echocardiography (ECHO) and electrocardiography (ECG) into CT representations by enforcing explicit phenotype-level consistency. This setting is intrinsically challenging because the lack of cardiac gating obfuscates the cardiac phase and the absence of contrast limits the visibility of cardiovascular structures, but these scans represent a rich resource for opportunistic cardiac screening. The approach combines CLIP-style contrastive pre-training to align image and tabular embeddings with a non-parametric Nadaraya–Watson phenotype head, which uses a support-bank to guide the latent space toward clinically meaningful axes. This enables the image encoder to learn physiological features that are generalized beyond the modality boundaries. We pre-train using data from 20,574 patients and fine-tune the resulting image encoder on ten cardiac abnormality prediction tasks. The proposed method consistently outperforms both the standard contrastive learning and the baseline without pre-training, achieving a gain of up to 8% of AUROC on the test set. In the 5-shot setting, phenotype-guided alignment improves AUROC by an average of 9.8% over baselines, demonstrating strong data efficiency and generalization from few labeled samples. Our results show that explicit phenotype-guided alignment yields interpretable, data-efficient representations that transfer cardiac knowledge to non-cardiac CTs, defining a promising paradigm for multimodal medical imaging. comment Multimodal medical data offer an opportunity to learn general-purpose representations for cardiovascular diagnosis. We introduce X-Cardia, a cardiac phenotype-guided multimodal framework that uses structured data as intermediate supervision during pre-training. X-Cardia learns to extract cardiac information from non-contrast, non-gated chest CT scans by aligning CT features with tabular measurements derived from echocardiography (ECHO) and electrocardiography (ECG). Our method combines CLIP-style contrastive pre-training with a non-parametric Nadaraya–Watson (NW) prediction head that enforces phenotype-level similarity via exemplar-based alignment. Pre-training on 20,574 patients, followed by fine-tuning on ten cardiac abnormality prediction tasks, yields substantial performance gains. X-Cardia improves AUROC by up to 8% on the held-out test set and delivers an average 11.8% AUROC improvement in a 5-shot regime. These results demonstrate that explicit phenotype alignment produces interpretable, data-efficient representations and enables routine chest CT to support opportunistic cardiac screening.}
}

Endnote

%0 Conference Paper
%T X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT
%A Nusrat Binta Nizam
%A Fengbei Liu
%A Sunwoo Kwak
%A Ilan Richter
%A Jayant K Raikhelkar
%A Ashley Beecy
%A Nir Uriel
%A Deborah Estrin
%A Mert R Sabuncu
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-nizam26a
%I PMLR
%P 1740--1767
%U https://proceedings.mlr.press/v315/nizam26a.html
%V 315
%X comment Deep learning models for cardiac prognostics often operate within single-modality frameworks, limiting their ability to capture physiologically meaningful cross-modal relationships. In particular, we focus on non-gated, non-contrast chest computed tomography (CT) scans that are typically acquired for entirely non-cardiac indications, rather than for dedicated cardiac assessment. We introduce X-Cardia, a phenotype-guided multimodal alignment framework that transfers structural cardiac phenotypes from echocardiography (ECHO) and electrocardiography (ECG) into CT representations by enforcing explicit phenotype-level consistency. This setting is intrinsically challenging because the lack of cardiac gating obfuscates the cardiac phase and the absence of contrast limits the visibility of cardiovascular structures, but these scans represent a rich resource for opportunistic cardiac screening. The approach combines CLIP-style contrastive pre-training to align image and tabular embeddings with a non-parametric Nadaraya–Watson phenotype head, which uses a support-bank to guide the latent space toward clinically meaningful axes. This enables the image encoder to learn physiological features that are generalized beyond the modality boundaries. We pre-train using data from 20,574 patients and fine-tune the resulting image encoder on ten cardiac abnormality prediction tasks. The proposed method consistently outperforms both the standard contrastive learning and the baseline without pre-training, achieving a gain of up to 8% of AUROC on the test set. In the 5-shot setting, phenotype-guided alignment improves AUROC by an average of 9.8% over baselines, demonstrating strong data efficiency and generalization from few labeled samples. Our results show that explicit phenotype-guided alignment yields interpretable, data-efficient representations that transfer cardiac knowledge to non-cardiac CTs, defining a promising paradigm for multimodal medical imaging. comment Multimodal medical data offer an opportunity to learn general-purpose representations for cardiovascular diagnosis. We introduce X-Cardia, a cardiac phenotype-guided multimodal framework that uses structured data as intermediate supervision during pre-training. X-Cardia learns to extract cardiac information from non-contrast, non-gated chest CT scans by aligning CT features with tabular measurements derived from echocardiography (ECHO) and electrocardiography (ECG). Our method combines CLIP-style contrastive pre-training with a non-parametric Nadaraya–Watson (NW) prediction head that enforces phenotype-level similarity via exemplar-based alignment. Pre-training on 20,574 patients, followed by fine-tuning on ten cardiac abnormality prediction tasks, yields substantial performance gains. X-Cardia improves AUROC by up to 8% on the held-out test set and delivers an average 11.8% AUROC improvement in a 5-shot regime. These results demonstrate that explicit phenotype alignment produces interpretable, data-efficient representations and enables routine chest CT to support opportunistic cardiac screening.

APA

Nizam, N.B., Liu, F., Kwak, S., Richter, I., Raikhelkar, J.K., Beecy, A., Uriel, N., Estrin, D. & Sabuncu, M.R.. (2026). X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:1740-1767 Available from https://proceedings.mlr.press/v315/nizam26a.html.

Related Material

Download PDF