A Quantitative Evaluation Protocol for Assessing the Clinical Usefulness of 3D Saliency Explanations for MRI-based Alzheimer’s Classification

Tamal Chakroborty, Yang Liu
Proceedings of the The 39th Canadian Conference on Artificial Intelligence, PMLR 318:983-989, 2026.

Abstract

While Explainable AI (XAI) is widely considered essential for building clinical trust in MRI-based 3D deep learning models for Alzheimer’s disease (AD) detection, the clinical validation of these explanations is insufficiently rigorous. Current evaluation protocols for assessing clinical usefulness rely mainly on subjective visual inspections or limited attributions ’top-k’ regional overlap measures. These methods do not offer a standardized benchmark, making it difficult to objectively determine which explanation method most accurately aligns with the complex and distributed nature of neurodegenerative pathology. To address this gap, this paper proposes a quantitative evaluation protocol for assessing the clinical usefulness of 3D saliency maps through metric-based anatomical alignment. We implement a comprehensive scoring system based on AD neuropathology that assigns clinical importance weights to anatomical regions, allowing for mathematical verification of explanation integrity. We employ a variety of ranking and alignment metrics to evaluate five gradient-based XAI methods: Grad-CAM, Grad-CAM++, HiResCAM, Backpropagation, and Guided Backpropagation, applied to a pre-trained 3D DenseNet architecture. Our findings reveal notable disparities in usefulness that visual inspection and the existing regional overlap protocol often fail to detect properly. Among XAI methods, Grad-CAM++ demonstrated considerable instability and poor alignment with clinical relevance, while Backpropagation and Guided Backpropagation displayed superior spatial consistency by effectively prioritizing clinically significant biomarkers. This protocol provides a structured approach for evaluating explanation methods, advancing empirical alignment between XAI outputs and established pathological evidence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v318-chakroborty26a, title = {A Quantitative Evaluation Protocol for Assessing the Clinical Usefulness of 3D Saliency Explanations for MRI-based Alzheimer’s Classification}, author = {Chakroborty, Tamal and Liu, Yang}, booktitle = {Proceedings of the The 39th Canadian Conference on Artificial Intelligence}, pages = {983--989}, year = {2026}, editor = {Bouzar-Benlabiod, Lydia and Leung, Carson}, volume = {318}, series = {Proceedings of Machine Learning Research}, month = {25--29 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v318/main/assets/chakroborty26a/chakroborty26a.pdf}, url = {https://proceedings.mlr.press/v318/chakroborty26a.html}, abstract = {While Explainable AI (XAI) is widely considered essential for building clinical trust in MRI-based 3D deep learning models for Alzheimer’s disease (AD) detection, the clinical validation of these explanations is insufficiently rigorous. Current evaluation protocols for assessing clinical usefulness rely mainly on subjective visual inspections or limited attributions ’top-k’ regional overlap measures. These methods do not offer a standardized benchmark, making it difficult to objectively determine which explanation method most accurately aligns with the complex and distributed nature of neurodegenerative pathology. To address this gap, this paper proposes a quantitative evaluation protocol for assessing the clinical usefulness of 3D saliency maps through metric-based anatomical alignment. We implement a comprehensive scoring system based on AD neuropathology that assigns clinical importance weights to anatomical regions, allowing for mathematical verification of explanation integrity. We employ a variety of ranking and alignment metrics to evaluate five gradient-based XAI methods: Grad-CAM, Grad-CAM++, HiResCAM, Backpropagation, and Guided Backpropagation, applied to a pre-trained 3D DenseNet architecture. Our findings reveal notable disparities in usefulness that visual inspection and the existing regional overlap protocol often fail to detect properly. Among XAI methods, Grad-CAM++ demonstrated considerable instability and poor alignment with clinical relevance, while Backpropagation and Guided Backpropagation displayed superior spatial consistency by effectively prioritizing clinically significant biomarkers. This protocol provides a structured approach for evaluating explanation methods, advancing empirical alignment between XAI outputs and established pathological evidence.} }
Endnote
%0 Conference Paper %T A Quantitative Evaluation Protocol for Assessing the Clinical Usefulness of 3D Saliency Explanations for MRI-based Alzheimer’s Classification %A Tamal Chakroborty %A Yang Liu %B Proceedings of the The 39th Canadian Conference on Artificial Intelligence %C Proceedings of Machine Learning Research %D 2026 %E Lydia Bouzar-Benlabiod %E Carson Leung %F pmlr-v318-chakroborty26a %I PMLR %P 983--989 %U https://proceedings.mlr.press/v318/chakroborty26a.html %V 318 %X While Explainable AI (XAI) is widely considered essential for building clinical trust in MRI-based 3D deep learning models for Alzheimer’s disease (AD) detection, the clinical validation of these explanations is insufficiently rigorous. Current evaluation protocols for assessing clinical usefulness rely mainly on subjective visual inspections or limited attributions ’top-k’ regional overlap measures. These methods do not offer a standardized benchmark, making it difficult to objectively determine which explanation method most accurately aligns with the complex and distributed nature of neurodegenerative pathology. To address this gap, this paper proposes a quantitative evaluation protocol for assessing the clinical usefulness of 3D saliency maps through metric-based anatomical alignment. We implement a comprehensive scoring system based on AD neuropathology that assigns clinical importance weights to anatomical regions, allowing for mathematical verification of explanation integrity. We employ a variety of ranking and alignment metrics to evaluate five gradient-based XAI methods: Grad-CAM, Grad-CAM++, HiResCAM, Backpropagation, and Guided Backpropagation, applied to a pre-trained 3D DenseNet architecture. Our findings reveal notable disparities in usefulness that visual inspection and the existing regional overlap protocol often fail to detect properly. Among XAI methods, Grad-CAM++ demonstrated considerable instability and poor alignment with clinical relevance, while Backpropagation and Guided Backpropagation displayed superior spatial consistency by effectively prioritizing clinically significant biomarkers. This protocol provides a structured approach for evaluating explanation methods, advancing empirical alignment between XAI outputs and established pathological evidence.
APA
Chakroborty, T. & Liu, Y.. (2026). A Quantitative Evaluation Protocol for Assessing the Clinical Usefulness of 3D Saliency Explanations for MRI-based Alzheimer’s Classification. Proceedings of the The 39th Canadian Conference on Artificial Intelligence, in Proceedings of Machine Learning Research 318:983-989 Available from https://proceedings.mlr.press/v318/chakroborty26a.html.

Related Material