[edit]
Scalable Detection of Undiagnosed ILD in Population Screening: A Multi-Cohort Study using 3D Foundation Models
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4579-4599, 2026.
Abstract
Undiagnosed interstitial lung disease (UILD), an early form of lung fibrosis, is increasingly detected in population-based low-dose computed tomography (LDCT) screening but remains systematically under-reported due to its subtle appearance. We developed and validated a foundation-model-augmented deep learning system for UILD detection across two of the largest thoracic CT cohorts worldwide: SUMMIT, the UK’s largest LDCT screening study ($>$11{,}000 scans), and COPDGene, a multi-centre US cohort spanning 21 scanners and $>$8{,}800 scans. We propose ViT-3D-TE, a multi-token 3D Vision Transformer designed to preserve both high-frequency focal texture and diffuse parenchymal change through CLS, MAX, and AVG token fusion. The model was initialised with TANGERINE, an open-source 3D masked autoencoder pretrained on 98{,}000 full-volume LDCT scans, providing volumetric priors essential for stable optimisation. ViT-3D-TE was trained solely on SUMMIT and evaluated on COPDGene without domain adaptation, and achieved strong performance (AUROC 0.9805, AUPRC 0.7699 internal; AUROC 0.9705, AUPRC 0.6170 external), representing 17$\times$ and 25$\times$ improvements over random baselines at clinically realistic cohort prevalences (4.6% and 2.5%). We further introduce ConvNeXt-2.5-MIL, a slice-based 2.5D alternative that performs competitively without relying on 3D foundation model pretraining. Together, these results provide, to our knowledge, the largest real-world validation to date of deep learning for UILD detection and demonstrate that foundation-model-enhanced 3D Transformers offer a practical and scalable pathway for integrating UILD detection into national LDCT screening workflows.