[edit]
Context-Aware Patch Representations for Multiple Instance Learning
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:510-543, 2026.
Abstract
In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce , a novel setting for MIL methods, inspired by advances in Neural Partial Differential Equation (PDE) solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features —extracted using a frozen patch encoder— into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art (SOTA) slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by $48%$–$92.8%$ versus SOTA MILs, lowering FLOPs during inference by $52%$–$99%$, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.