[edit]
Translating Classifier Scores into Clinical Impact: Calibrated Risk and Queueing Simulation for AI-Assisted Radiology Worklist Triage
Proceedings of The Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:60-66, 2026.
Abstract
Radiology worklists are typically processed on a first-in, first-out (FIFO) basis, even when studies differ greatly in clinical urgency. We propose a pragmatic alternative: using calibrated probabilities of intracranial hemorrhage (ICH) to prioritize head CT exams for earlier reading. Using the public RSNA-ICH dataset, we train slice-level detectors, aggregate them to the exam level, apply post-hoc calibration, and feed these scores into a transparent discrete-event simulator of the reading queue. The simulator quantifies how triage benefits reduction in median time-to-read (TTR) for ICH, which scales with classifier AUC, workload (arrival rate), staffing, prevalence, and calibration. Across realistic loads, score-based prioritization yields substantial TTR reductions for ICH with minimal delay to non-ICH studies. We release a configuration-driven, reproducible pipeline that translates AI risk scores into operational metrics (minutes saved), enabling safe and data-driven evaluation before PACS/RIS1 deployment.