DTC-WSI: Dynamic Token Compression for Whole-Slide Images

Tawsifur Rahman; Aliasghar Tarkhan; Rama Chellappa; Alexander S. Baras

DTC-WSI: Dynamic Token Compression for Whole-Slide Images

Tawsifur Rahman, Aliasghar Tarkhan, Rama Chellappa, Alexander S. Baras

Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:3846-3865, 2026.

Abstract

Whole-slide images (WSIs) contain tens of thousands of heterogeneous patches, making transformer-based multiple-instance learning (MIL) computationally expensive due to quadratic attention costs and substantial redundancy in tissue morphology. Existing token-reduction approaches for WSI analysis rely primarily on pruning, which discards information early in training and destabilizes optimization under weak supervision. We propose Dynamic Token Compression for Whole-Slide Images (DTC-WSI), a token-efficient MIL framework that performs progressive, importance-aware WSI compression. DTC-WSI integrates a lightweight saliency network with a multi-stage token compressor that combines bipartite similarity matching and soft differentiable pruning to gradually eliminate redundant or non-diagnostic patches. During training, soft gates enable stable gradient flow, while inference employs deterministic compression for substantial acceleration. This curriculum-style compression preserves discriminative morphology and dramatically reduces computational burden. Across four WSI benchmarks (TCGA-NSCLC, TCGA-BRCA, TCGA-RCC, PANDA), DTC-WSI achieves 5–10$\times$ token reduction, up to 5.3$\times$ faster inference, and 20–40% lower memory usage, while improving MIL classification accuracy by 2–4% over state-of-the-art baselines. Our results demonstrate that dynamic token compression is a powerful and scalable alternative to pruning, enabling efficient transformer-based WSI analysis while improving accuracy.

Cite this Paper

BibTeX

@InProceedings{pmlr-v315-rahman26b,
  title = 	 {DTC-WSI: Dynamic Token Compression for Whole-Slide Images},
  author =       {Rahman, Tawsifur and Tarkhan, Aliasghar and Chellappa, Rama and Baras, Alexander S.},
  booktitle = 	 {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  pages = 	 {3846--3865},
  year = 	 {2026},
  editor = 	 {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining},
  volume = 	 {315},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--10 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v315/main/assets/rahman26b/rahman26b.pdf},
  url = 	 {https://proceedings.mlr.press/v315/rahman26b.html},
  abstract = 	 {Whole-slide images (WSIs) contain tens of thousands of heterogeneous patches, making transformer-based multiple-instance learning (MIL) computationally expensive due to quadratic attention costs and substantial redundancy in tissue morphology. Existing token-reduction approaches for WSI analysis rely primarily on pruning, which discards information early in training and destabilizes optimization under weak supervision. We propose Dynamic Token Compression for Whole-Slide Images (DTC-WSI), a token-efficient MIL framework that performs progressive, importance-aware WSI compression. DTC-WSI integrates a lightweight saliency network with a multi-stage token compressor that combines bipartite similarity matching and soft differentiable pruning to gradually eliminate redundant or non-diagnostic patches. During training, soft gates enable stable gradient flow, while inference employs deterministic compression for substantial acceleration. This curriculum-style compression preserves discriminative morphology and dramatically reduces computational burden. Across four WSI benchmarks (TCGA-NSCLC, TCGA-BRCA, TCGA-RCC, PANDA), DTC-WSI achieves 5–10$\times$ token reduction, up to 5.3$\times$ faster inference, and 20–40% lower memory usage, while improving MIL classification accuracy by 2–4% over state-of-the-art baselines. Our results demonstrate that dynamic token compression is a powerful and scalable alternative to pruning, enabling efficient transformer-based WSI analysis while improving accuracy.}
}

Endnote

%0 Conference Paper
%T DTC-WSI: Dynamic Token Compression for Whole-Slide Images
%A Tawsifur Rahman
%A Aliasghar Tarkhan
%A Rama Chellappa
%A Alexander S. Baras
%B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Yuankai Huo
%E Mingchen Gao
%E Chang-Fu Kuo
%E Yueming Jin
%E Ruining Deng	
%F pmlr-v315-rahman26b
%I PMLR
%P 3846--3865
%U https://proceedings.mlr.press/v315/rahman26b.html
%V 315
%X Whole-slide images (WSIs) contain tens of thousands of heterogeneous patches, making transformer-based multiple-instance learning (MIL) computationally expensive due to quadratic attention costs and substantial redundancy in tissue morphology. Existing token-reduction approaches for WSI analysis rely primarily on pruning, which discards information early in training and destabilizes optimization under weak supervision. We propose Dynamic Token Compression for Whole-Slide Images (DTC-WSI), a token-efficient MIL framework that performs progressive, importance-aware WSI compression. DTC-WSI integrates a lightweight saliency network with a multi-stage token compressor that combines bipartite similarity matching and soft differentiable pruning to gradually eliminate redundant or non-diagnostic patches. During training, soft gates enable stable gradient flow, while inference employs deterministic compression for substantial acceleration. This curriculum-style compression preserves discriminative morphology and dramatically reduces computational burden. Across four WSI benchmarks (TCGA-NSCLC, TCGA-BRCA, TCGA-RCC, PANDA), DTC-WSI achieves 5–10$\times$ token reduction, up to 5.3$\times$ faster inference, and 20–40% lower memory usage, while improving MIL classification accuracy by 2–4% over state-of-the-art baselines. Our results demonstrate that dynamic token compression is a powerful and scalable alternative to pruning, enabling efficient transformer-based WSI analysis while improving accuracy.

APA

Rahman, T., Tarkhan, A., Chellappa, R. & Baras, A.S.. (2026). DTC-WSI: Dynamic Token Compression for Whole-Slide Images. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:3846-3865 Available from https://proceedings.mlr.press/v315/rahman26b.html.

Related Material

Download PDF