[edit]
PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization
Proceedings of The 8th International Conference on Medical Imaging with Deep Learning, PMLR 301:547-587, 2026.
Abstract
Weakly supervised object localization (WSOL) methods allow training models to classifyimages and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis.Standard WSOL methods rely on class activation mapping (CAM) methods to producespatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limitedvisual ROI saliency in histology images and scarce localization cues. They also face thewell-known issue of asynchronous convergence between classification and localization tasks.The two-step approach is sub-optimal because it is constrained to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when appliedto out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOLis introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of animage encoder that is shared with classification. This allows learning discriminant featuresand accurate delineation of foreground/background regions to support ROI localizationand image classification. We propose PixelCAM, a cost-effective foreground/backgroundpixel-wise classifier in the pixel-feature space that allows for spatial object localization.Using partial-cross entropy, PixelCAM is trained using pixel pseudo-labels collected from apretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneouslyusing standard gradient descent. In addition, our pixel classifier can easily be integratedinto CNN- and transformer-based architectures without any modifications. Our extensiveexperiments1 on GlaS and CAMELYON16 cancer datasets show that PixelCAM can improveclassification and localization performance when integrated with different WSOL methods.Most importantly, it provides robustness on both tasks for OOD data linked to differentcancer types, with large domain shifts between training and testing image data.