LILE: Look In-Depth before Looking Elsewhere – A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives

Danial Maleki, H.R Tizhoosh
Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:879-894, 2022.

Abstract

The volume of available data has grown dramatically in recent years in many applications. Furthermore, the age of networks that used multiple modalities separately has practically ended. Therefore, enabling bidirectional cross-modality data retrieval capable of processing has become a requirement for many domains and disciplines of research. This is especially true in the medical field, as data comes in a multitude of types, including various types of images and reports as well as molecular data. Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. However, regardless of their importance in their own modality, these approaches usually consider features of each modality equally. In this study, self-attention as an additional loss term will be proposed to enrich the internal representation provided into the cross attention module. This work suggests a novel architecture with a new loss term to help represent images and texts in the joint latent space. Experiment results on two benchmark datasets, i.e. MS-COCO and ARCH, show the effectiveness of the proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v172-maleki22a, title = {LILE: Look In-Depth before Looking Elsewhere – A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives}, author = {Maleki, Danial and Tizhoosh, H.R}, booktitle = {Proceedings of The 5th International Conference on Medical Imaging with Deep Learning}, pages = {879--894}, year = {2022}, editor = {Konukoglu, Ender and Menze, Bjoern and Venkataraman, Archana and Baumgartner, Christian and Dou, Qi and Albarqouni, Shadi}, volume = {172}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v172/maleki22a/maleki22a.pdf}, url = {https://proceedings.mlr.press/v172/maleki22a.html}, abstract = {The volume of available data has grown dramatically in recent years in many applications. Furthermore, the age of networks that used multiple modalities separately has practically ended. Therefore, enabling bidirectional cross-modality data retrieval capable of processing has become a requirement for many domains and disciplines of research. This is especially true in the medical field, as data comes in a multitude of types, including various types of images and reports as well as molecular data. Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. However, regardless of their importance in their own modality, these approaches usually consider features of each modality equally. In this study, self-attention as an additional loss term will be proposed to enrich the internal representation provided into the cross attention module. This work suggests a novel architecture with a new loss term to help represent images and texts in the joint latent space. Experiment results on two benchmark datasets, i.e. MS-COCO and ARCH, show the effectiveness of the proposed method.} }
Endnote
%0 Conference Paper %T LILE: Look In-Depth before Looking Elsewhere – A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives %A Danial Maleki %A H.R Tizhoosh %B Proceedings of The 5th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2022 %E Ender Konukoglu %E Bjoern Menze %E Archana Venkataraman %E Christian Baumgartner %E Qi Dou %E Shadi Albarqouni %F pmlr-v172-maleki22a %I PMLR %P 879--894 %U https://proceedings.mlr.press/v172/maleki22a.html %V 172 %X The volume of available data has grown dramatically in recent years in many applications. Furthermore, the age of networks that used multiple modalities separately has practically ended. Therefore, enabling bidirectional cross-modality data retrieval capable of processing has become a requirement for many domains and disciplines of research. This is especially true in the medical field, as data comes in a multitude of types, including various types of images and reports as well as molecular data. Most contemporary works apply cross attention to highlight the essential elements of an image or text in relation to the other modalities and try to match them together. However, regardless of their importance in their own modality, these approaches usually consider features of each modality equally. In this study, self-attention as an additional loss term will be proposed to enrich the internal representation provided into the cross attention module. This work suggests a novel architecture with a new loss term to help represent images and texts in the joint latent space. Experiment results on two benchmark datasets, i.e. MS-COCO and ARCH, show the effectiveness of the proposed method.
APA
Maleki, D. & Tizhoosh, H.. (2022). LILE: Look In-Depth before Looking Elsewhere – A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology Archives. Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 172:879-894 Available from https://proceedings.mlr.press/v172/maleki22a.html.

Related Material