[edit]
Perspective: Listening to Users when Auditing Medical AI Scribes
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1619-1631, 2026.
Abstract
Medical AI scribes are rapidly being adopted to reduce documentation burdens on clinicians, with systems already deployed across millions of patient visits. While these tools offer substantial efficiency benefits and reduced clinician burnout, they pose serious risks through transcription errors and hallucinations. These risks are disproportionately placed on certain demographics of speakers, from patients with speech disorders to psychiatric illnesses. We argue for more principled audits to be conducted on medical AI scribes, analogous to post-marketing surveillance for medical devices. Our framework for doing so involves: (1) collecting diverse, medically-relevant speech datasets representative of real patient and provider populations, (2) developing metric suites that go beyond the singular gold standard of Word Error Rates, and (3) conducting human-centered design research to align functionality with the needs of both medical providers and patients.