[edit]
X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:1740-1767, 2026.
Abstract
comment Deep learning models for cardiac prognostics often operate within single-modality frameworks, limiting their ability to capture physiologically meaningful cross-modal relationships. In particular, we focus on non-gated, non-contrast chest computed tomography (CT) scans that are typically acquired for entirely non-cardiac indications, rather than for dedicated cardiac assessment. We introduce X-Cardia, a phenotype-guided multimodal alignment framework that transfers structural cardiac phenotypes from echocardiography (ECHO) and electrocardiography (ECG) into CT representations by enforcing explicit phenotype-level consistency. This setting is intrinsically challenging because the lack of cardiac gating obfuscates the cardiac phase and the absence of contrast limits the visibility of cardiovascular structures, but these scans represent a rich resource for opportunistic cardiac screening. The approach combines CLIP-style contrastive pre-training to align image and tabular embeddings with a non-parametric Nadaraya–Watson phenotype head, which uses a support-bank to guide the latent space toward clinically meaningful axes. This enables the image encoder to learn physiological features that are generalized beyond the modality boundaries. We pre-train using data from 20,574 patients and fine-tune the resulting image encoder on ten cardiac abnormality prediction tasks. The proposed method consistently outperforms both the standard contrastive learning and the baseline without pre-training, achieving a gain of up to 8% of AUROC on the test set. In the 5-shot setting, phenotype-guided alignment improves AUROC by an average of 9.8% over baselines, demonstrating strong data efficiency and generalization from few labeled samples. Our results show that explicit phenotype-guided alignment yields interpretable, data-efficient representations that transfer cardiac knowledge to non-cardiac CTs, defining a promising paradigm for multimodal medical imaging. comment Multimodal medical data offer an opportunity to learn general-purpose representations for cardiovascular diagnosis. We introduce X-Cardia, a cardiac phenotype-guided multimodal framework that uses structured data as intermediate supervision during pre-training. X-Cardia learns to extract cardiac information from non-contrast, non-gated chest CT scans by aligning CT features with tabular measurements derived from echocardiography (ECHO) and electrocardiography (ECG). Our method combines CLIP-style contrastive pre-training with a non-parametric Nadaraya–Watson (NW) prediction head that enforces phenotype-level similarity via exemplar-based alignment. Pre-training on 20,574 patients, followed by fine-tuning on ten cardiac abnormality prediction tasks, yields substantial performance gains. X-Cardia improves AUROC by up to 8% on the held-out test set and delivers an average 11.8% AUROC improvement in a 5-shot regime. These results demonstrate that explicit phenotype alignment produces interpretable, data-efficient representations and enables routine chest CT to support opportunistic cardiac screening.