[edit]
TAFIE: Transformer-Assisted Fusion with Integrated Entropy Attention for Multimodal Medical Imaging
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:229-238, 2026.
Abstract
Multimodal medical image fusion aims to integrate complementary information from different imaging modalities to enhance clinical diagnosis and surgical navigation. While deep learning-based approaches have significantly advanced fusion quality over traditional methods by leveraging powerful feature extraction, challenges such as blurring, noise, and artifacts persist in the fused results. To address these issues, we propose a novel fusion framework that incorporates an entropy-based attention module to emphasize salient image regions. Our architecture is designed in a multi-scale manner, utilizing an adaptive gating mechanism to effectively extract and combine salient features across different scales. Additionally, we introduce a Top-K token vision transformer to enable efficient global feature extraction while reducing computational overhead by restricting the context space. We further demonstrate the effectiveness of our fused representations in the downstream task of oocyte quality prediction, showing improved accuracy over individual focal images as well as over other approaches. Extensive experiments on diverse medical imaging datasets demonstrate that our method achieves competitive performance compared to state-of-the-art techniques, both quantitatively and visually. Ablation studies underscore the importance of each proposed component.