UltraMAE: Multi-modal Masked Autoencoder for Ultrasound Pre-training

Aimon Rahman; Vishal M. Patel

UltraMAE: Multi-modal Masked Autoencoder for Ultrasound Pre-training

Aimon Rahman, Vishal M. Patel

Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, PMLR 250:1196-1206, 2024.

Abstract

Pre-training on a large dataset such as ImageNet followed by supervised fine-tuning has brought success in various deep learning-based tasks. However, the modalities of natural images and ultrasound images have considerable differences, making pre-training on natural images ineffective for ultrasound-related tasks. In this paper, we introduce a unified masking-based model for both ultrasound images and videos that learns better visual representation than the network with single-modality representations. This is the first large-scale generalized ultrasound pre-training network that simultaneously utilizes 100,000+ videos and images of different parts of the human anatomy such as the liver, bones, heart, thyroids, nerves, etc, making the network an effective benchmark pretrained model for any ultrasound-specific downstream tasks. We propose a novel method for ultrasound image analysis that utilizes an ultrasound-specific confidence map to guide low-level representation learning through masked feature acquisition. Our pre-trained network has demonstrated remarkable efficacy and versatility in tackling both classification and segmentation tasks across a range of ultrasound pathologies, highlighting its potential for widespread adoption and impact in the ultrasound field. In addition, we show that our pre-training model can be leveraged to learn efficiently with a small number of labeled ultrasound images.

Cite this Paper

BibTeX

@InProceedings{pmlr-v250-rahman24a,
  title = 	 {UltraMAE: Multi-modal Masked Autoencoder for Ultrasound Pre-training},
  author =       {Rahman, Aimon and Patel, Vishal M.},
  booktitle = 	 {Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning},
  pages = 	 {1196--1206},
  year = 	 {2024},
  editor = 	 {Burgos, Ninon and Petitjean, Caroline and Vakalopoulou, Maria and Christodoulidis, Stergios and Coupe, Pierrick and Delingette, Hervé and Lartizien, Carole and Mateus, Diana},
  volume = 	 {250},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v250/main/assets/rahman24a/rahman24a.pdf},
  url = 	 {https://proceedings.mlr.press/v250/rahman24a.html},
  abstract = 	 {Pre-training on a large dataset such as ImageNet followed by supervised fine-tuning has brought success in various deep learning-based tasks. However, the modalities of natural images and ultrasound images have considerable differences, making pre-training on natural images ineffective for ultrasound-related tasks. In this paper, we introduce a unified masking-based model for both ultrasound images and videos that learns better visual representation than the network with single-modality representations. This is the first large-scale generalized ultrasound pre-training network that simultaneously utilizes 100,000+ videos and images of different parts of the human anatomy such as the liver, bones, heart, thyroids, nerves, etc, making the network an effective benchmark pretrained model for any ultrasound-specific downstream tasks. We propose a novel method for ultrasound image analysis that utilizes an ultrasound-specific confidence map to guide low-level representation learning through masked feature acquisition. Our pre-trained network has demonstrated remarkable efficacy and versatility in tackling both classification and segmentation tasks across a range of ultrasound pathologies, highlighting its potential for widespread adoption and impact in the ultrasound field. In addition, we show that our pre-training model can be leveraged to learn efficiently with a small number of labeled ultrasound images.}
}

Endnote

%0 Conference Paper
%T UltraMAE: Multi-modal Masked Autoencoder for Ultrasound Pre-training
%A Aimon Rahman
%A Vishal M. Patel
%B Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ninon Burgos
%E Caroline Petitjean
%E Maria Vakalopoulou
%E Stergios Christodoulidis
%E Pierrick Coupe
%E Hervé Delingette
%E Carole Lartizien
%E Diana Mateus	
%F pmlr-v250-rahman24a
%I PMLR
%P 1196--1206
%U https://proceedings.mlr.press/v250/rahman24a.html
%V 250
%X Pre-training on a large dataset such as ImageNet followed by supervised fine-tuning has brought success in various deep learning-based tasks. However, the modalities of natural images and ultrasound images have considerable differences, making pre-training on natural images ineffective for ultrasound-related tasks. In this paper, we introduce a unified masking-based model for both ultrasound images and videos that learns better visual representation than the network with single-modality representations. This is the first large-scale generalized ultrasound pre-training network that simultaneously utilizes 100,000+ videos and images of different parts of the human anatomy such as the liver, bones, heart, thyroids, nerves, etc, making the network an effective benchmark pretrained model for any ultrasound-specific downstream tasks. We propose a novel method for ultrasound image analysis that utilizes an ultrasound-specific confidence map to guide low-level representation learning through masked feature acquisition. Our pre-trained network has demonstrated remarkable efficacy and versatility in tackling both classification and segmentation tasks across a range of ultrasound pathologies, highlighting its potential for widespread adoption and impact in the ultrasound field. In addition, we show that our pre-training model can be leveraged to learn efficiently with a small number of labeled ultrasound images.

APA

Rahman, A. & Patel, V.M.. (2024). UltraMAE: Multi-modal Masked Autoencoder for Ultrasound Pre-training. Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 250:1196-1206 Available from https://proceedings.mlr.press/v250/rahman24a.html.

Related Material

Download PDF