Analysis of Transformers for Medical Image Retrieval

Arvapalli Sai Susmitha, Vinay P. Namboodiri
Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, PMLR 250:1497-1512, 2024.

Abstract

This paper investigates the application of transformers to medical image retrieval. Although various methods have been attempted in this domain, transformers have not been extensively explored. Leveraging vision transformers, we consider co-attention between image tokens. Two main aspects are investigated: the analysis of various architectures and parameters for transformers and the evaluation of explanation techniques. Specifically, we employ contrastive learning to retrieve attention-based images that consider the relationships between query and database images. Our experiments on diverse medical datasets, such as ISIC 2017, COVID-19 chest X-ray, and Kvasir, using multiple transformer architectures, demonstrate superior performance compared to convolution-based methods and transformers using cross-entropy losses. Further, we conducted a quantitative evaluation of various state-of-the-art explanation techniques using insertion-deletion metrics, in addition to basic qualitative assessments. Among these methods, Transformer Input Sampling (TIS) stands out, showcasing superior performance and enhancing interpretability, thus distinguishing it from black-box models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v250-susmitha24a, title = {Analysis of Transformers for Medical Image Retrieval}, author = {Susmitha, Arvapalli Sai and Namboodiri, Vinay P.}, booktitle = {Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning}, pages = {1497--1512}, year = {2024}, editor = {Burgos, Ninon and Petitjean, Caroline and Vakalopoulou, Maria and Christodoulidis, Stergios and Coupe, Pierrick and Delingette, Hervé and Lartizien, Carole and Mateus, Diana}, volume = {250}, series = {Proceedings of Machine Learning Research}, month = {03--05 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v250/main/assets/susmitha24a/susmitha24a.pdf}, url = {https://proceedings.mlr.press/v250/susmitha24a.html}, abstract = {This paper investigates the application of transformers to medical image retrieval. Although various methods have been attempted in this domain, transformers have not been extensively explored. Leveraging vision transformers, we consider co-attention between image tokens. Two main aspects are investigated: the analysis of various architectures and parameters for transformers and the evaluation of explanation techniques. Specifically, we employ contrastive learning to retrieve attention-based images that consider the relationships between query and database images. Our experiments on diverse medical datasets, such as ISIC 2017, COVID-19 chest X-ray, and Kvasir, using multiple transformer architectures, demonstrate superior performance compared to convolution-based methods and transformers using cross-entropy losses. Further, we conducted a quantitative evaluation of various state-of-the-art explanation techniques using insertion-deletion metrics, in addition to basic qualitative assessments. Among these methods, Transformer Input Sampling (TIS) stands out, showcasing superior performance and enhancing interpretability, thus distinguishing it from black-box models.} }
Endnote
%0 Conference Paper %T Analysis of Transformers for Medical Image Retrieval %A Arvapalli Sai Susmitha %A Vinay P. Namboodiri %B Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2024 %E Ninon Burgos %E Caroline Petitjean %E Maria Vakalopoulou %E Stergios Christodoulidis %E Pierrick Coupe %E Hervé Delingette %E Carole Lartizien %E Diana Mateus %F pmlr-v250-susmitha24a %I PMLR %P 1497--1512 %U https://proceedings.mlr.press/v250/susmitha24a.html %V 250 %X This paper investigates the application of transformers to medical image retrieval. Although various methods have been attempted in this domain, transformers have not been extensively explored. Leveraging vision transformers, we consider co-attention between image tokens. Two main aspects are investigated: the analysis of various architectures and parameters for transformers and the evaluation of explanation techniques. Specifically, we employ contrastive learning to retrieve attention-based images that consider the relationships between query and database images. Our experiments on diverse medical datasets, such as ISIC 2017, COVID-19 chest X-ray, and Kvasir, using multiple transformer architectures, demonstrate superior performance compared to convolution-based methods and transformers using cross-entropy losses. Further, we conducted a quantitative evaluation of various state-of-the-art explanation techniques using insertion-deletion metrics, in addition to basic qualitative assessments. Among these methods, Transformer Input Sampling (TIS) stands out, showcasing superior performance and enhancing interpretability, thus distinguishing it from black-box models.
APA
Susmitha, A.S. & Namboodiri, V.P.. (2024). Analysis of Transformers for Medical Image Retrieval. Proceedings of The 7nd International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 250:1497-1512 Available from https://proceedings.mlr.press/v250/susmitha24a.html.

Related Material