Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

Jue Jiang, Aneesh Rangnekar, Harini Veeraraghavan
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:679-694, 2026.

Abstract

Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hence, we propose an attention guided masking mechanism within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, aiming to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we integrate a noisy teacher into the co-distillation framework (termed DAGMaN) to enable attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised clustering of organs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v301-jiang26a, title = {Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images}, author = {Jiang, Jue and Rangnekar, Aneesh and Veeraraghavan, Harini}, booktitle = {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning}, pages = {679--694}, year = {2026}, editor = {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan}, volume = {301}, series = {Proceedings of Machine Learning Research}, month = {09--11 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v301/main/assets/jiang26a/jiang26a.pdf}, url = {https://proceedings.mlr.press/v301/jiang26a.html}, abstract = {Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hence, we propose an attention guided masking mechanism within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, aiming to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we integrate a noisy teacher into the co-distillation framework (termed DAGMaN) to enable attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised clustering of organs.} }
Endnote
%0 Conference Paper %T Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images %A Jue Jiang %A Aneesh Rangnekar %A Harini Veeraraghavan %B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Tolga Tasdizen %E Shireen Elhabian %E Ronald Summers %E Chen Chen %E Lisa Koch %E Yan Zhuang %F pmlr-v301-jiang26a %I PMLR %P 679--694 %U https://proceedings.mlr.press/v301/jiang26a.html %V 301 %X Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hence, we propose an attention guided masking mechanism within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, aiming to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts downstream task performance. To address this, we integrate a noisy teacher into the co-distillation framework (termed DAGMaN) to enable attentive masking while preserving high attention head diversity. We demonstrate the capability of DAGMaN on multiple tasks including full- and few-shot lung nodule classification, immunotherapy outcome prediction, tumor segmentation, and unsupervised clustering of organs.
APA
Jiang, J., Rangnekar, A. & Veeraraghavan, H.. (2026). Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:679-694 Available from https://proceedings.mlr.press/v301/jiang26a.html.

Related Material