Scalable Detection of Undiagnosed ILD in Population Screening: A Multi-Cohort Study using 3D Foundation Models

Niccolò McConnell, Mehran Azimbagirad, Daryl O. Cheng, Daisuke Yamada, Ryoko Egashira, Robert Chapman, John McCabe, Shanshan Wang, David Lynch, Greg Kinney, Pardeep Vasudev, Paul Taylor, Daniel C. Alexander, Sam M. Janes, Joseph Jacob
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:4579-4599, 2026.

Abstract

Undiagnosed interstitial lung disease (UILD), an early form of lung fibrosis, is increasingly detected in population-based low-dose computed tomography (LDCT) screening but remains systematically under-reported due to its subtle appearance. We developed and validated a foundation-model-augmented deep learning system for UILD detection across two of the largest thoracic CT cohorts worldwide: SUMMIT, the UK’s largest LDCT screening study ($>$11{,}000 scans), and COPDGene, a multi-centre US cohort spanning 21 scanners and $>$8{,}800 scans. We propose ViT-3D-TE, a multi-token 3D Vision Transformer designed to preserve both high-frequency focal texture and diffuse parenchymal change through CLS, MAX, and AVG token fusion. The model was initialised with TANGERINE, an open-source 3D masked autoencoder pretrained on 98{,}000 full-volume LDCT scans, providing volumetric priors essential for stable optimisation. ViT-3D-TE was trained solely on SUMMIT and evaluated on COPDGene without domain adaptation, and achieved strong performance (AUROC 0.9805, AUPRC 0.7699 internal; AUROC 0.9705, AUPRC 0.6170 external), representing 17$\times$ and 25$\times$ improvements over random baselines at clinically realistic cohort prevalences (4.6% and 2.5%). We further introduce ConvNeXt-2.5-MIL, a slice-based 2.5D alternative that performs competitively without relying on 3D foundation model pretraining. Together, these results provide, to our knowledge, the largest real-world validation to date of deep learning for UILD detection and demonstrate that foundation-model-enhanced 3D Transformers offer a practical and scalable pathway for integrating UILD detection into national LDCT screening workflows.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-mcconnell26a, title = {Scalable Detection of Undiagnosed ILD in Population Screening: A Multi-Cohort Study using 3D Foundation Models}, author = {McConnell, Niccol\`o and Azimbagirad, Mehran and Cheng, Daryl O. and Yamada, Daisuke and Egashira, Ryoko and Chapman, Robert and McCabe, John and Wang, Shanshan and Lynch, David and Kinney, Greg and Vasudev, Pardeep and Taylor, Paul and Alexander, Daniel C. and Janes, Sam M. and Jacob, Joseph}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {4579--4599}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/mcconnell26a/mcconnell26a.pdf}, url = {https://proceedings.mlr.press/v315/mcconnell26a.html}, abstract = {Undiagnosed interstitial lung disease (UILD), an early form of lung fibrosis, is increasingly detected in population-based low-dose computed tomography (LDCT) screening but remains systematically under-reported due to its subtle appearance. We developed and validated a foundation-model-augmented deep learning system for UILD detection across two of the largest thoracic CT cohorts worldwide: SUMMIT, the UK’s largest LDCT screening study ($>$11{,}000 scans), and COPDGene, a multi-centre US cohort spanning 21 scanners and $>$8{,}800 scans. We propose ViT-3D-TE, a multi-token 3D Vision Transformer designed to preserve both high-frequency focal texture and diffuse parenchymal change through CLS, MAX, and AVG token fusion. The model was initialised with TANGERINE, an open-source 3D masked autoencoder pretrained on 98{,}000 full-volume LDCT scans, providing volumetric priors essential for stable optimisation. ViT-3D-TE was trained solely on SUMMIT and evaluated on COPDGene without domain adaptation, and achieved strong performance (AUROC 0.9805, AUPRC 0.7699 internal; AUROC 0.9705, AUPRC 0.6170 external), representing 17$\times$ and 25$\times$ improvements over random baselines at clinically realistic cohort prevalences (4.6% and 2.5%). We further introduce ConvNeXt-2.5-MIL, a slice-based 2.5D alternative that performs competitively without relying on 3D foundation model pretraining. Together, these results provide, to our knowledge, the largest real-world validation to date of deep learning for UILD detection and demonstrate that foundation-model-enhanced 3D Transformers offer a practical and scalable pathway for integrating UILD detection into national LDCT screening workflows.} }
Endnote
%0 Conference Paper %T Scalable Detection of Undiagnosed ILD in Population Screening: A Multi-Cohort Study using 3D Foundation Models %A Niccolò McConnell %A Mehran Azimbagirad %A Daryl O. Cheng %A Daisuke Yamada %A Ryoko Egashira %A Robert Chapman %A John McCabe %A Shanshan Wang %A David Lynch %A Greg Kinney %A Pardeep Vasudev %A Paul Taylor %A Daniel C. Alexander %A Sam M. Janes %A Joseph Jacob %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-mcconnell26a %I PMLR %P 4579--4599 %U https://proceedings.mlr.press/v315/mcconnell26a.html %V 315 %X Undiagnosed interstitial lung disease (UILD), an early form of lung fibrosis, is increasingly detected in population-based low-dose computed tomography (LDCT) screening but remains systematically under-reported due to its subtle appearance. We developed and validated a foundation-model-augmented deep learning system for UILD detection across two of the largest thoracic CT cohorts worldwide: SUMMIT, the UK’s largest LDCT screening study ($>$11{,}000 scans), and COPDGene, a multi-centre US cohort spanning 21 scanners and $>$8{,}800 scans. We propose ViT-3D-TE, a multi-token 3D Vision Transformer designed to preserve both high-frequency focal texture and diffuse parenchymal change through CLS, MAX, and AVG token fusion. The model was initialised with TANGERINE, an open-source 3D masked autoencoder pretrained on 98{,}000 full-volume LDCT scans, providing volumetric priors essential for stable optimisation. ViT-3D-TE was trained solely on SUMMIT and evaluated on COPDGene without domain adaptation, and achieved strong performance (AUROC 0.9805, AUPRC 0.7699 internal; AUROC 0.9705, AUPRC 0.6170 external), representing 17$\times$ and 25$\times$ improvements over random baselines at clinically realistic cohort prevalences (4.6% and 2.5%). We further introduce ConvNeXt-2.5-MIL, a slice-based 2.5D alternative that performs competitively without relying on 3D foundation model pretraining. Together, these results provide, to our knowledge, the largest real-world validation to date of deep learning for UILD detection and demonstrate that foundation-model-enhanced 3D Transformers offer a practical and scalable pathway for integrating UILD detection into national LDCT screening workflows.
APA
McConnell, N., Azimbagirad, M., Cheng, D.O., Yamada, D., Egashira, R., Chapman, R., McCabe, J., Wang, S., Lynch, D., Kinney, G., Vasudev, P., Taylor, P., Alexander, D.C., Janes, S.M. & Jacob, J.. (2026). Scalable Detection of Undiagnosed ILD in Population Screening: A Multi-Cohort Study using 3D Foundation Models. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:4579-4599 Available from https://proceedings.mlr.press/v315/mcconnell26a.html.

Related Material