Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training

Jeya Maria Jose Valanarasu, Yucheng Tang, Dong Yang, Ziyue Xu, Can Zhao, Wenqi Li, Vishal M. Patel, Bennett Allan Landman, Daguang Xu, Yufan He, Vishwesh Nath
Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, PMLR 250:1553-1570, 2024.

Abstract

Harnessing the power of pre-training on large-scale datasets like ImageNet forms a funda- mental building block for the progress of representation learning-driven solutions in com- puter vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical im- ages require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology im- ages. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to re- construct the original image from disruptions created by a combination of local masking and low-level perturbations. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation chal- lenge. Our code can be found here: https://github.com/Project-MONAI/research-contributions/tree/main/DAE.

Cite this Paper


BibTeX
@InProceedings{pmlr-v250-valanarasu24a, title = {Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training}, author = {Valanarasu, Jeya Maria Jose and Tang, Yucheng and Yang, Dong and Xu, Ziyue and Zhao, Can and Li, Wenqi and Patel, Vishal M. and Landman, Bennett Allan and Xu, Daguang and He, Yufan and Nath, Vishwesh}, booktitle = {Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning}, pages = {1553--1570}, year = {2024}, editor = {Burgos, Ninon and Petitjean, Caroline and Vakalopoulou, Maria and Christodoulidis, Stergios and Coupe, Pierrick and Delingette, Hervé and Lartizien, Carole and Mateus, Diana}, volume = {250}, series = {Proceedings of Machine Learning Research}, month = {03--05 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v250/main/assets/valanarasu24a/valanarasu24a.pdf}, url = {https://proceedings.mlr.press/v250/valanarasu24a.html}, abstract = {Harnessing the power of pre-training on large-scale datasets like ImageNet forms a funda- mental building block for the progress of representation learning-driven solutions in com- puter vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical im- ages require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology im- ages. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to re- construct the original image from disruptions created by a combination of local masking and low-level perturbations. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation chal- lenge. Our code can be found here: https://github.com/Project-MONAI/research-contributions/tree/main/DAE.} }
Endnote
%0 Conference Paper %T Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training %A Jeya Maria Jose Valanarasu %A Yucheng Tang %A Dong Yang %A Ziyue Xu %A Can Zhao %A Wenqi Li %A Vishal M. Patel %A Bennett Allan Landman %A Daguang Xu %A Yufan He %A Vishwesh Nath %B Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ninon Burgos %E Caroline Petitjean %E Maria Vakalopoulou %E Stergios Christodoulidis %E Pierrick Coupe %E Hervé Delingette %E Carole Lartizien %E Diana Mateus %F pmlr-v250-valanarasu24a %I PMLR %P 1553--1570 %U https://proceedings.mlr.press/v250/valanarasu24a.html %V 250 %X Harnessing the power of pre-training on large-scale datasets like ImageNet forms a funda- mental building block for the progress of representation learning-driven solutions in com- puter vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical im- ages require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology im- ages. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to re- construct the original image from disruptions created by a combination of local masking and low-level perturbations. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation chal- lenge. Our code can be found here: https://github.com/Project-MONAI/research-contributions/tree/main/DAE.
APA
Valanarasu, J.M.J., Tang, Y., Yang, D., Xu, Z., Zhao, C., Li, W., Patel, V.M., Landman, B.A., Xu, D., He, Y. & Nath, V.. (2024). Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training. Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 250:1553-1570 Available from https://proceedings.mlr.press/v250/valanarasu24a.html.

Related Material