PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization

Alexis Guichemerre; Soufiane Belharbi; Mohammadhadi Shateri; Luke McCaffrey; Eric Granger

PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization

Alexis Guichemerre, Soufiane Belharbi, Mohammadhadi Shateri, Luke McCaffrey, Eric Granger

Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:547-587, 2026.

Abstract

Weakly supervised object localization (WSOL) methods allow training models to classifyimages and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis.Standard WSOL methods rely on class activation mapping (CAM) methods to producespatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limitedvisual ROI saliency in histology images and scarce localization cues. They also face thewell-known issue of asynchronous convergence between classification and localization tasks.The two-step approach is sub-optimal because it is constrained to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when appliedto out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOLis introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of animage encoder that is shared with classification. This allows learning discriminant featuresand accurate delineation of foreground/background regions to support ROI localizationand image classification. We propose PixelCAM, a cost-effective foreground/backgroundpixel-wise classifier in the pixel-feature space that allows for spatial object localization.Using partial-cross entropy, PixelCAM is trained using pixel pseudo-labels collected from apretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneouslyusing standard gradient descent. In addition, our pixel classifier can easily be integratedinto CNN- and transformer-based architectures without any modifications. Our extensiveexperiments1 on GlaS and CAMELYON16 cancer datasets show that PixelCAM can improveclassification and localization performance when integrated with different WSOL methods.Most importantly, it provides robustness on both tasks for OOD data linked to differentcancer types, with large domain shifts between training and testing image data.

Cite this Paper

BibTeX

@InProceedings{pmlr-v301-guichemerre26a,
  title = 	 {PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization},
  author =       {Guichemerre, Alexis and Belharbi, Soufiane and Shateri, Mohammadhadi and McCaffrey, Luke and Granger, Eric},
  booktitle = 	 {Proceedings of The 8th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {547--587},
  year = 	 {2026},
  editor = 	 {Tasdizen, Tolga and Elhabian, Shireen and Summers, Ronald and Chen, Chen and Koch, Lisa and Zhuang, Yan},
  volume = 	 {301},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--11 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v301/main/assets/guichemerre26a/guichemerre26a.pdf},
  url = 	 {https://proceedings.mlr.press/v301/guichemerre26a.html},
  abstract = 	 {Weakly supervised object localization (WSOL) methods allow training models to classifyimages and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis.Standard WSOL methods rely on class activation mapping (CAM) methods to producespatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limitedvisual ROI saliency in histology images and scarce localization cues. They also face thewell-known issue of asynchronous convergence between classification and localization tasks.The two-step approach is sub-optimal because it is constrained to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when appliedto out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOLis introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of animage encoder that is shared with classification. This allows learning discriminant featuresand accurate delineation of foreground/background regions to support ROI localizationand image classification. We propose PixelCAM, a cost-effective foreground/backgroundpixel-wise classifier in the pixel-feature space that allows for spatial object localization.Using partial-cross entropy, PixelCAM is trained using pixel pseudo-labels collected from apretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneouslyusing standard gradient descent. In addition, our pixel classifier can easily be integratedinto CNN- and transformer-based architectures without any modifications. Our extensiveexperiments1 on GlaS and CAMELYON16 cancer datasets show that PixelCAM can improveclassification and localization performance when integrated with different WSOL methods.Most importantly, it provides robustness on both tasks for OOD data linked to differentcancer types, with large domain shifts between training and testing image data.}
}

Endnote

%0 Conference Paper
%T PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization
%A Alexis Guichemerre
%A Soufiane Belharbi
%A Mohammadhadi Shateri
%A Luke McCaffrey
%A Eric Granger
%B Proceedings of The 8th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Tolga Tasdizen
%E Shireen Elhabian
%E Ronald Summers
%E Chen Chen
%E Lisa Koch
%E Yan Zhuang	
%F pmlr-v301-guichemerre26a
%I PMLR
%P 547--587
%U https://proceedings.mlr.press/v301/guichemerre26a.html
%V 301
%X Weakly supervised object localization (WSOL) methods allow training models to classifyimages and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis.Standard WSOL methods rely on class activation mapping (CAM) methods to producespatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limitedvisual ROI saliency in histology images and scarce localization cues. They also face thewell-known issue of asynchronous convergence between classification and localization tasks.The two-step approach is sub-optimal because it is constrained to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when appliedto out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOLis introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of animage encoder that is shared with classification. This allows learning discriminant featuresand accurate delineation of foreground/background regions to support ROI localizationand image classification. We propose PixelCAM, a cost-effective foreground/backgroundpixel-wise classifier in the pixel-feature space that allows for spatial object localization.Using partial-cross entropy, PixelCAM is trained using pixel pseudo-labels collected from apretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneouslyusing standard gradient descent. In addition, our pixel classifier can easily be integratedinto CNN- and transformer-based architectures without any modifications. Our extensiveexperiments1 on GlaS and CAMELYON16 cancer datasets show that PixelCAM can improveclassification and localization performance when integrated with different WSOL methods.Most importantly, it provides robustness on both tasks for OOD data linked to differentcancer types, with large domain shifts between training and testing image data.

APA

Guichemerre, A., Belharbi, S., Shateri, M., McCaffrey, L. & Granger, E.. (2026). PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization. Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 301:547-587 Available from https://proceedings.mlr.press/v301/guichemerre26a.html.

Related Material

Download PDF