[edit]
Explainable Medical Image Segmentation via Attention-Gated Fusion of Vision Transformers and U-Nets
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:272-283, 2026.
Abstract
Medical image segmentation is essential for assisting medical professionals in locating anomalies in images. The lack of explainability in current medical image segmentation frameworks demonstrates a gap in assisting clinicians in understanding how segmentation decisions are made, towards identifying the segmentation target. In this paper, we present a framework that offers an improved approach for assisting medical professionals in locating anomalies while providing visual explanations in the form of heatmaps of the target. We propose a dual encoder architecture using a U-Net encoder and Vision Transformer to perform accurate segmentation. We employ an attention fusion mechanism to fuse both encoder embeddings and generate an explainability heatmap that offers improved results for highlighting important features. We include discussion that reflects on the ways in which our approach advances the state of the art for medical decision making, in comparison with other current research, elaborating as well as on how the approach can be of value for distinct healthcare concerns. While our current results focus on how our dual encoder approach yields significant benefit, we also briefly discuss how to integrate textual explanations alongside, as a valued step forward for future work. Keywords: Explainable AI, Medical Applications of AI, Computer Vision Segmentation, AI for Social Good, Transformers, Attention.