TAFIE: Transformer-Assisted Fusion with Integrated Entropy Attention for Multimodal Medical Imaging

Abhinav Sagar
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:229-238, 2026.

Abstract

Multimodal medical image fusion aims to integrate complementary information from different imaging modalities to enhance clinical diagnosis and surgical navigation. While deep learning-based approaches have significantly advanced fusion quality over traditional methods by leveraging powerful feature extraction, challenges such as blurring, noise, and artifacts persist in the fused results. To address these issues, we propose a novel fusion framework that incorporates an entropy-based attention module to emphasize salient image regions. Our architecture is designed in a multi-scale manner, utilizing an adaptive gating mechanism to effectively extract and combine salient features across different scales. Additionally, we introduce a Top-K token vision transformer to enable efficient global feature extraction while reducing computational overhead by restricting the context space. We further demonstrate the effectiveness of our fused representations in the downstream task of oocyte quality prediction, showing improved accuracy over individual focal images as well as over other approaches. Extensive experiments on diverse medical imaging datasets demonstrate that our method achieves competitive performance compared to state-of-the-art techniques, both quantitatively and visually. Ablation studies underscore the importance of each proposed component.

Cite this Paper


BibTeX
@InProceedings{pmlr-v317-sagar26c, title = {TAFIE: Transformer-Assisted Fusion with Integrated Entropy Attention for Multimodal Medical Imaging}, author = {Sagar, Abhinav}, booktitle = {Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare}, pages = {229--238}, year = {2026}, editor = {Wu, Junde and Pan, Jiazhen and Zhu, Jiayuan and Luo, Luyang and Li, Yitong and Xu, Min and Jin, Yueming and Rueckert, Daniel}, volume = {317}, series = {Proceedings of Machine Learning Research}, month = {20--21 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v317/main/assets/sagar26c/sagar26c.pdf}, url = {https://proceedings.mlr.press/v317/sagar26c.html}, abstract = {Multimodal medical image fusion aims to integrate complementary information from different imaging modalities to enhance clinical diagnosis and surgical navigation. While deep learning-based approaches have significantly advanced fusion quality over traditional methods by leveraging powerful feature extraction, challenges such as blurring, noise, and artifacts persist in the fused results. To address these issues, we propose a novel fusion framework that incorporates an entropy-based attention module to emphasize salient image regions. Our architecture is designed in a multi-scale manner, utilizing an adaptive gating mechanism to effectively extract and combine salient features across different scales. Additionally, we introduce a Top-K token vision transformer to enable efficient global feature extraction while reducing computational overhead by restricting the context space. We further demonstrate the effectiveness of our fused representations in the downstream task of oocyte quality prediction, showing improved accuracy over individual focal images as well as over other approaches. Extensive experiments on diverse medical imaging datasets demonstrate that our method achieves competitive performance compared to state-of-the-art techniques, both quantitatively and visually. Ablation studies underscore the importance of each proposed component.} }
Endnote
%0 Conference Paper %T TAFIE: Transformer-Assisted Fusion with Integrated Entropy Attention for Multimodal Medical Imaging %A Abhinav Sagar %B Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare %C Proceedings of Machine Learning Research %D 2026 %E Junde Wu %E Jiazhen Pan %E Jiayuan Zhu %E Luyang Luo %E Yitong Li %E Min Xu %E Yueming Jin %E Daniel Rueckert %F pmlr-v317-sagar26c %I PMLR %P 229--238 %U https://proceedings.mlr.press/v317/sagar26c.html %V 317 %X Multimodal medical image fusion aims to integrate complementary information from different imaging modalities to enhance clinical diagnosis and surgical navigation. While deep learning-based approaches have significantly advanced fusion quality over traditional methods by leveraging powerful feature extraction, challenges such as blurring, noise, and artifacts persist in the fused results. To address these issues, we propose a novel fusion framework that incorporates an entropy-based attention module to emphasize salient image regions. Our architecture is designed in a multi-scale manner, utilizing an adaptive gating mechanism to effectively extract and combine salient features across different scales. Additionally, we introduce a Top-K token vision transformer to enable efficient global feature extraction while reducing computational overhead by restricting the context space. We further demonstrate the effectiveness of our fused representations in the downstream task of oocyte quality prediction, showing improved accuracy over individual focal images as well as over other approaches. Extensive experiments on diverse medical imaging datasets demonstrate that our method achieves competitive performance compared to state-of-the-art techniques, both quantitatively and visually. Ablation studies underscore the importance of each proposed component.
APA
Sagar, A.. (2026). TAFIE: Transformer-Assisted Fusion with Integrated Entropy Attention for Multimodal Medical Imaging. Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 317:229-238 Available from https://proceedings.mlr.press/v317/sagar26c.html.

Related Material