KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers

Abrar Rashid; Vishal Jain; Sarah Cechnicka; Aamir Chaudry; Candice Roufosse; Bernhard Kainz

KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers

Abrar Rashid, Vishal Jain, Sarah Cechnicka, Aamir Chaudry, Candice Roufosse, Bernhard Kainz

Proceedings of the MICCAI Workshop on Computational Pathology, PMLR 316:106-115, 2026.

Abstract

Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.

Cite this Paper

BibTeX

@InProceedings{pmlr-v316-rashid26a,
  title = 	 {KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers},
  author =       {Rashid, Abrar and Jain, Vishal and Cechnicka, Sarah and Chaudry, Aamir and Roufosse, Candice and Kainz, Bernhard},
  booktitle = 	 {Proceedings of the MICCAI Workshop on Computational Pathology},
  pages = 	 {106--115},
  year = 	 {2026},
  editor = 	 {Studer, Linda and Ciompi, Francesco and Khalili, Nadieh and Faryna, Khrystyna and Faryna, Khrystyna and Yeong, Joe and Lau, Mai Chan and Chen, Hao and Liu, Ziyi and Brattoli, Biagio},
  volume = 	 {316},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v316/main/assets/rashid26a/rashid26a.pdf},
  url = 	 {https://proceedings.mlr.press/v316/rashid26a.html},
  abstract = 	 {Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.}
}

Endnote

%0 Conference Paper
%T KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers
%A Abrar Rashid
%A Vishal Jain
%A Sarah Cechnicka
%A Aamir Chaudry
%A Candice Roufosse
%A Bernhard Kainz
%B Proceedings of the MICCAI Workshop on Computational Pathology
%C Proceedings of Machine Learning Research
%D 2026
%E Linda Studer
%E Francesco Ciompi
%E Nadieh Khalili
%E Khrystyna Faryna
%E Khrystyna Faryna
%E Joe Yeong
%E Mai Chan Lau
%E Hao Chen
%E Ziyi Liu
%E Biagio Brattoli	
%F pmlr-v316-rashid26a
%I PMLR
%P 106--115
%U https://proceedings.mlr.press/v316/rashid26a.html
%V 316
%X Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.

APA

Rashid, A., Jain, V., Cechnicka, S., Chaudry, A., Roufosse, C. & Kainz, B.. (2026). KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers. Proceedings of the MICCAI Workshop on Computational Pathology, in Proceedings of Machine Learning Research 316:106-115 Available from https://proceedings.mlr.press/v316/rashid26a.html.

Related Material

Download PDF