Translating Classifier Scores into Clinical Impact: Calibrated Risk and Queueing Simulation for AI-Assisted Radiology Worklist Triage

Tirthajit Baruah, Punit Rathore
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:60-66, 2026.

Abstract

Radiology worklists are typically processed on a first-in, first-out (FIFO) basis, even when studies differ greatly in clinical urgency. We propose a pragmatic alternative: using calibrated probabilities of intracranial hemorrhage (ICH) to prioritize head CT exams for earlier reading. Using the public RSNA-ICH dataset, we train slice-level detectors, aggregate them to the exam level, apply post-hoc calibration, and feed these scores into a transparent discrete-event simulator of the reading queue. The simulator quantifies how triage benefits reduction in median time-to-read (TTR) for ICH, which scales with classifier AUC, workload (arrival rate), staffing, prevalence, and calibration. Across realistic loads, score-based prioritization yields substantial TTR reductions for ICH with minimal delay to non-ICH studies. We release a configuration-driven, reproducible pipeline that translates AI risk scores into operational metrics (minutes saved), enabling safe and data-driven evaluation before PACS/RIS1 deployment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v317-baruah26a, title = {Translating Classifier Scores into Clinical Impact: Calibrated Risk and Queueing Simulation for AI-Assisted Radiology Worklist Triage}, author = {Baruah, Tirthajit and Rathore, Punit}, booktitle = {Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare}, pages = {60--66}, year = {2026}, editor = {Wu, Junde and Pan, Jiazhen and Zhu, Jiayuan and Luo, Luyang and Li, Yitong and Xu, Min and Jin, Yueming and Rueckert, Daniel}, volume = {317}, series = {Proceedings of Machine Learning Research}, month = {20--21 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v317/main/assets/baruah26a/baruah26a.pdf}, url = {https://proceedings.mlr.press/v317/baruah26a.html}, abstract = {Radiology worklists are typically processed on a first-in, first-out (FIFO) basis, even when studies differ greatly in clinical urgency. We propose a pragmatic alternative: using calibrated probabilities of intracranial hemorrhage (ICH) to prioritize head CT exams for earlier reading. Using the public RSNA-ICH dataset, we train slice-level detectors, aggregate them to the exam level, apply post-hoc calibration, and feed these scores into a transparent discrete-event simulator of the reading queue. The simulator quantifies how triage benefits reduction in median time-to-read (TTR) for ICH, which scales with classifier AUC, workload (arrival rate), staffing, prevalence, and calibration. Across realistic loads, score-based prioritization yields substantial TTR reductions for ICH with minimal delay to non-ICH studies. We release a configuration-driven, reproducible pipeline that translates AI risk scores into operational metrics (minutes saved), enabling safe and data-driven evaluation before PACS/RIS1 deployment.} }
Endnote
%0 Conference Paper %T Translating Classifier Scores into Clinical Impact: Calibrated Risk and Queueing Simulation for AI-Assisted Radiology Worklist Triage %A Tirthajit Baruah %A Punit Rathore %B Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare %C Proceedings of Machine Learning Research %D 2026 %E Junde Wu %E Jiazhen Pan %E Jiayuan Zhu %E Luyang Luo %E Yitong Li %E Min Xu %E Yueming Jin %E Daniel Rueckert %F pmlr-v317-baruah26a %I PMLR %P 60--66 %U https://proceedings.mlr.press/v317/baruah26a.html %V 317 %X Radiology worklists are typically processed on a first-in, first-out (FIFO) basis, even when studies differ greatly in clinical urgency. We propose a pragmatic alternative: using calibrated probabilities of intracranial hemorrhage (ICH) to prioritize head CT exams for earlier reading. Using the public RSNA-ICH dataset, we train slice-level detectors, aggregate them to the exam level, apply post-hoc calibration, and feed these scores into a transparent discrete-event simulator of the reading queue. The simulator quantifies how triage benefits reduction in median time-to-read (TTR) for ICH, which scales with classifier AUC, workload (arrival rate), staffing, prevalence, and calibration. Across realistic loads, score-based prioritization yields substantial TTR reductions for ICH with minimal delay to non-ICH studies. We release a configuration-driven, reproducible pipeline that translates AI risk scores into operational metrics (minutes saved), enabling safe and data-driven evaluation before PACS/RIS1 deployment.
APA
Baruah, T. & Rathore, P.. (2026). Translating Classifier Scores into Clinical Impact: Calibrated Risk and Queueing Simulation for AI-Assisted Radiology Worklist Triage. Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, in Proceedings of Machine Learning Research 317:60-66 Available from https://proceedings.mlr.press/v317/baruah26a.html.

Related Material