Context-Aware Patch Representations for Multiple Instance Learning

Andreas Lolos; Theofilos Christodoulou; Aris L. Moustakas; Stergios Christodoulidis; Maria Vakalopoulou

Context-Aware Patch Representations for Multiple Instance Learning

Andreas Lolos, Theofilos Christodoulou, Aris L. Moustakas, Stergios Christodoulidis, Maria Vakalopoulou

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:510-543, 2026.

Abstract

In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce , a novel setting for MIL methods, inspired by advances in Neural Partial Differential Equation (PDE) solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features —extracted using a frozen patch encoder— into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art (SOTA) slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by $48%$–$92.8%$ versus SOTA MILs, lowering FLOPs during inference by $52%$–$99%$, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-lolos26a,
  title = 	 {Context-Aware Patch Representations for Multiple Instance Learning},
  author =       {Lolos, Andreas and Christodoulou, Theofilos and Moustakas, Aris L. and Christodoulidis, Stergios and Vakalopoulou, Maria},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {510--543},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/lolos26a/lolos26a.pdf},
  url = 	 {https://proceedings.mlr.press/v315/lolos26a.html},
  abstract = 	 {In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce , a novel setting for MIL methods, inspired by advances in Neural Partial Differential Equation (PDE) solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features —extracted using a frozen patch encoder— into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art (SOTA) slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by $48%$–$92.8%$ versus SOTA MILs, lowering FLOPs during inference by $52%$–$99%$, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.}
}

Endnote

%0 Conference Paper
%T Context-Aware Patch Representations for Multiple Instance Learning
%A Andreas Lolos
%A Theofilos Christodoulou
%A Aris L. Moustakas
%A Stergios Christodoulidis
%A Maria Vakalopoulou
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-lolos26a
%I PMLR
%P 510--543
%U https://proceedings.mlr.press/v315/lolos26a.html
%V 315
%X In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce , a novel setting for MIL methods, inspired by advances in Neural Partial Differential Equation (PDE) solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features —extracted using a frozen patch encoder— into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art (SOTA) slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by $48%$–$92.8%$ versus SOTA MILs, lowering FLOPs during inference by $52%$–$99%$, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.

APA

Lolos, A., Christodoulou, T., Moustakas, A.L., Christodoulidis, S. & Vakalopoulou, M.. (2026). Context-Aware Patch Representations for Multiple Instance Learning. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:510-543 Available from https://proceedings.mlr.press/v315/lolos26a.html.

Related Material

Download PDF