KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers

Abrar Rashid, Vishal Jain, Sarah Cechnicka, Aamir Chaudry, Candice Roufosse, Bernhard Kainz
Proceedings of the MICCAI Workshop on Computational Pathology, PMLR 316:106-115, 2026.

Abstract

Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.

Cite this Paper


BibTeX
@InProceedings{pmlr-v316-rashid26a, title = {KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers}, author = {Rashid, Abrar and Jain, Vishal and Cechnicka, Sarah and Chaudry, Aamir and Roufosse, Candice and Kainz, Bernhard}, booktitle = {Proceedings of the MICCAI Workshop on Computational Pathology}, pages = {106--115}, year = {2026}, editor = {Studer, Linda and Ciompi, Francesco and Khalili, Nadieh and Faryna, Khrystyna and Faryna, Khrystyna and Yeong, Joe and Lau, Mai Chan and Chen, Hao and Liu, Ziyi and Brattoli, Biagio}, volume = {316}, series = {Proceedings of Machine Learning Research}, month = {27 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v316/main/assets/rashid26a/rashid26a.pdf}, url = {https://proceedings.mlr.press/v316/rashid26a.html}, abstract = {Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.} }
Endnote
%0 Conference Paper %T KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers %A Abrar Rashid %A Vishal Jain %A Sarah Cechnicka %A Aamir Chaudry %A Candice Roufosse %A Bernhard Kainz %B Proceedings of the MICCAI Workshop on Computational Pathology %C Proceedings of Machine Learning Research %D 2026 %E Linda Studer %E Francesco Ciompi %E Nadieh Khalili %E Khrystyna Faryna %E Khrystyna Faryna %E Joe Yeong %E Mai Chan Lau %E Hao Chen %E Ziyi Liu %E Biagio Brattoli %F pmlr-v316-rashid26a %I PMLR %P 106--115 %U https://proceedings.mlr.press/v316/rashid26a.html %V 316 %X Accurate tubulitis scoring is essential for managing kidney transplant rejection, yet manual assessment is subjective and suffers from severe inter-rater variability ($\kappa$w=0.17), leading to inconsistent treatment decisions. While recent works have attempted binary tubulitis detection, fine-grained scoring (T0-T3) required for clinical decision-making remains unaddressed. We present the first automated approach for granular tubulitis scoring using only slide-level supervision. Our approach aggregates spatially correlated features from tubulecentric image patches using a transformer-based attention pooling mechanism. To ensure diagnostic focus, patches are pre-filtered using a segmentation model trained to detect renal tubules, restricting the input space to regions most relevant for scoring. Evaluated on 93 routine PAS-stained slides (75 for training/validation, 18 held-out test), our method achieves a weighted kappa of $\kappa$w = 0.75 (4.4$\times$ improvement over expert agreement), 83.3% within-one-grade accuracy, and strong correlation with expert scores (r = 0.81). Topattended regions demonstrate clinical plausibility, showing progressively greater inflammatory burden and tissue damage features with increasing T-scores. Our work demonstrates that weakly supervised learning can transform subjective pathology assessments into reliable, interpretable predictions, offering a practical path towards standardising transplant rejection diagnosis. The code is available on github.
APA
Rashid, A., Jain, V., Cechnicka, S., Chaudry, A., Roufosse, C. & Kainz, B.. (2026). KidneyGrader: Fine-Grained Tubulitis Scoring Using Weakly Supervised Transformers. Proceedings of the MICCAI Workshop on Computational Pathology, in Proceedings of Machine Learning Research 316:106-115 Available from https://proceedings.mlr.press/v316/rashid26a.html.

Related Material