More is more: leveraging multi-rater information for whole slide images grading via virtual expert panel

Jan Grove, Michel Botros, Ylva A Weeda, Clara I. Sánchez, Erik Bekkers, Sybren L Meijer, Hoel Kervadec
Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, PMLR 315:3270-3282, 2026.

Abstract

In medical imaging, datasets with several expert diagnoses capture diagnostic uncertainty, yet many approaches compress diagnoses into a single consensus label. Due to its highly subjective nature, Barret’s Esophagus gradings often diverge, thus necessitating several expert opinions to mitigate variation in diagnostic or treatment outcomes. Using a multi-rater dataset from the Dutch Esophageal Pathology Panel, we propose an approach to tackle the implied issues such as poor calibration and overconfident predictions that come with a compressed label. We offer an approach that models individual rater behaviors as part of virtual panels, allowing for better prediction performance while also improving the quality of uncertainty estimates for clinical decision-making when compared to pre-compressed labels. We show that due to their individual correlation with the clinical consensus, a combination of raters—especially an inclusion of all raters—yields higher performance and better calibrated predictions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v315-grove26a, title = {More is more: leveraging multi-rater information for whole slide images grading via virtual expert panel}, author = {Grove, Jan and Botros, Michel and Weeda, Ylva A and S{\'a}nchez, Clara I. and Bekkers, Erik and Meijer, Sybren L and Kervadec, Hoel}, booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning}, pages = {3270--3282}, year = {2026}, editor = {Huo, Yuankai and Gao, Mingchen and Kuo, Chang-Fu and Jin, Yueming and Deng, Ruining}, volume = {315}, series = {Proceedings of Machine Learning Research}, month = {08--10 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v315/main/assets/grove26a/grove26a.pdf}, url = {https://proceedings.mlr.press/v315/grove26a.html}, abstract = {In medical imaging, datasets with several expert diagnoses capture diagnostic uncertainty, yet many approaches compress diagnoses into a single consensus label. Due to its highly subjective nature, Barret’s Esophagus gradings often diverge, thus necessitating several expert opinions to mitigate variation in diagnostic or treatment outcomes. Using a multi-rater dataset from the Dutch Esophageal Pathology Panel, we propose an approach to tackle the implied issues such as poor calibration and overconfident predictions that come with a compressed label. We offer an approach that models individual rater behaviors as part of virtual panels, allowing for better prediction performance while also improving the quality of uncertainty estimates for clinical decision-making when compared to pre-compressed labels. We show that due to their individual correlation with the clinical consensus, a combination of raters—especially an inclusion of all raters—yields higher performance and better calibrated predictions.} }
Endnote
%0 Conference Paper %T More is more: leveraging multi-rater information for whole slide images grading via virtual expert panel %A Jan Grove %A Michel Botros %A Ylva A Weeda %A Clara I. Sánchez %A Erik Bekkers %A Sybren L Meijer %A Hoel Kervadec %B Proceedings of The 9th International Conference on Medical Imaging with Deep Learning %C Proceedings of Machine Learning Research %D 2026 %E Yuankai Huo %E Mingchen Gao %E Chang-Fu Kuo %E Yueming Jin %E Ruining Deng %F pmlr-v315-grove26a %I PMLR %P 3270--3282 %U https://proceedings.mlr.press/v315/grove26a.html %V 315 %X In medical imaging, datasets with several expert diagnoses capture diagnostic uncertainty, yet many approaches compress diagnoses into a single consensus label. Due to its highly subjective nature, Barret’s Esophagus gradings often diverge, thus necessitating several expert opinions to mitigate variation in diagnostic or treatment outcomes. Using a multi-rater dataset from the Dutch Esophageal Pathology Panel, we propose an approach to tackle the implied issues such as poor calibration and overconfident predictions that come with a compressed label. We offer an approach that models individual rater behaviors as part of virtual panels, allowing for better prediction performance while also improving the quality of uncertainty estimates for clinical decision-making when compared to pre-compressed labels. We show that due to their individual correlation with the clinical consensus, a combination of raters—especially an inclusion of all raters—yields higher performance and better calibrated predictions.
APA
Grove, J., Botros, M., Weeda, Y.A., Sánchez, C.I., Bekkers, E., Meijer, S.L. & Kervadec, H.. (2026). More is more: leveraging multi-rater information for whole slide images grading via virtual expert panel. Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, in Proceedings of Machine Learning Research 315:3270-3282 Available from https://proceedings.mlr.press/v315/grove26a.html.

Related Material