SoftCAM: Making black box models self-explainable for medical image analysis

Kerol Djoumessi, Philipp Berens
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:433-467, 2026.

Abstract

Convolutional neural networks (CNNs) are widely used for high-stakes applications like medicine, often surpassing human performance. However, most explanation methods rely on post-hoc attribution, approximating the decision-making process of already trained black-box models. These methods are often sensitive, unreliable, and fail to reflect true model reasoning, limiting their trustworthiness in critical applications. In this work, we introduce SoftCAM, a straightforward yet effective approach that makes standard CNN architectures inherently interpretable. By removing the global average pooling layer and replacing the fully connected classification layer with a convolution-based class evidence layer, SoftCAM preserves spatial information and produces explicit class activation maps that form the basis of the model’s predictions. Evaluated on three medical datasets spanning three imaging modalities, SoftCAM maintains classification performance while significantly improving both the qualitative and quantitative explanation compared to existing post-hoc methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-djoumessi26a, title = {SoftCAM: Making black box models self-explainable for medical image analysis}, author = {Djoumessi, Kerol and Berens, Philipp}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {433--467}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/djoumessi26a/djoumessi26a.pdf}, url = {https://proceedings.mlr.press/v315/djoumessi26a.html}, abstract = {Convolutional neural networks (CNNs) are widely used for high-stakes applications like medicine, often surpassing human performance. However, most explanation methods rely on post-hoc attribution, approximating the decision-making process of already trained black-box models. These methods are often sensitive, unreliable, and fail to reflect true model reasoning, limiting their trustworthiness in critical applications. In this work, we introduce SoftCAM, a straightforward yet effective approach that makes standard CNN architectures inherently interpretable. By removing the global average pooling layer and replacing the fully connected classification layer with a convolution-based class evidence layer, SoftCAM preserves spatial information and produces explicit class activation maps that form the basis of the model’s predictions. Evaluated on three medical datasets spanning three imaging modalities, SoftCAM maintains classification performance while significantly improving both the qualitative and quantitative explanation compared to existing post-hoc methods.} }
Endnote
%0 Conference Paper %T SoftCAM: Making black box models self-explainable for medical image analysis %A Kerol Djoumessi %A Philipp Berens %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-djoumessi26a %I PMLR %P 433--467 %U https://proceedings.mlr.press/v315/djoumessi26a.html %V 315 %X Convolutional neural networks (CNNs) are widely used for high-stakes applications like medicine, often surpassing human performance. However, most explanation methods rely on post-hoc attribution, approximating the decision-making process of already trained black-box models. These methods are often sensitive, unreliable, and fail to reflect true model reasoning, limiting their trustworthiness in critical applications. In this work, we introduce SoftCAM, a straightforward yet effective approach that makes standard CNN architectures inherently interpretable. By removing the global average pooling layer and replacing the fully connected classification layer with a convolution-based class evidence layer, SoftCAM preserves spatial information and produces explicit class activation maps that form the basis of the model’s predictions. Evaluated on three medical datasets spanning three imaging modalities, SoftCAM maintains classification performance while significantly improving both the qualitative and quantitative explanation compared to existing post-hoc methods.
APA
Djoumessi, K. & Berens, P.. (2026). SoftCAM: Making black box models self-explainable for medical image analysis. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:433-467 Available from https://proceedings.mlr.press/v315/djoumessi26a.html.

Related Material