Perspective: Listening to Users when Auditing Medical AI Scribes

Allison Koenecke, John-Jose Nunez, Anaïs Rameau, Irene Y. Chen
Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:1619-1631, 2026.

Abstract

Medical AI scribes are rapidly being adopted to reduce documentation burdens on clinicians, with systems already deployed across millions of patient visits. While these tools offer substantial efficiency benefits and reduced clinician burnout, they pose serious risks through transcription errors and hallucinations. These risks are disproportionately placed on certain demographics of speakers, from patients with speech disorders to psychiatric illnesses. We argue for more principled audits to be conducted on medical AI scribes, analogous to post-marketing surveillance for medical devices. Our framework for doing so involves: (1) collecting diverse, medically-relevant speech datasets representative of real patient and provider populations, (2) developing metric suites that go beyond the singular gold standard of Word Error Rates, and (3) conducting human-centered design research to align functionality with the needs of both medical providers and patients.

Cite this Paper


BibTeX
@InProceedings{pmlr-v297-koenecke26a, title = {Perspective: Listening to Users when Auditing Medical AI Scribes}, author = {Koenecke, Allison and Nunez, John-Jose and Rameau, Ana{\"i}s and Chen, Irene Y.}, booktitle = {Proceedings of the Fifth Machine Learning for Health Symposium}, pages = {1619--1631}, year = {2026}, editor = {Argaw, Peniel and Zhang, Haoran and Jabbour, Sarah and Chandak, Payal and Ji, Jerry and Mukherjee, Sumit and Salaudeen, Olawale and Chang, Trenton and Healey, Elizabeth and Gröger, Fabian and Adibi, Amin and Hegselmann, Stefan and Wild, Benjamin and Noori, Ayush}, volume = {297}, series = {Proceedings of Machine Learning Research}, month = {13--14 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v297/main/assets/koenecke26a/koenecke26a.pdf}, url = {https://proceedings.mlr.press/v297/koenecke26a.html}, abstract = {Medical AI scribes are rapidly being adopted to reduce documentation burdens on clinicians, with systems already deployed across millions of patient visits. While these tools offer substantial efficiency benefits and reduced clinician burnout, they pose serious risks through transcription errors and hallucinations. These risks are disproportionately placed on certain demographics of speakers, from patients with speech disorders to psychiatric illnesses. We argue for more principled audits to be conducted on medical AI scribes, analogous to post-marketing surveillance for medical devices. Our framework for doing so involves: (1) collecting diverse, medically-relevant speech datasets representative of real patient and provider populations, (2) developing metric suites that go beyond the singular gold standard of Word Error Rates, and (3) conducting human-centered design research to align functionality with the needs of both medical providers and patients.} }
Endnote
%0 Conference Paper %T Perspective: Listening to Users when Auditing Medical AI Scribes %A Allison Koenecke %A John-Jose Nunez %A Anaïs Rameau %A Irene Y. Chen %B Proceedings of the Fifth Machine Learning for Health Symposium %C Proceedings of Machine Learning Research %D 2026 %E Peniel Argaw %E Haoran Zhang %E Sarah Jabbour %E Payal Chandak %E Jerry Ji %E Sumit Mukherjee %E Olawale Salaudeen %E Trenton Chang %E Elizabeth Healey %E Fabian Gröger %E Amin Adibi %E Stefan Hegselmann %E Benjamin Wild %E Ayush Noori %F pmlr-v297-koenecke26a %I PMLR %P 1619--1631 %U https://proceedings.mlr.press/v297/koenecke26a.html %V 297 %X Medical AI scribes are rapidly being adopted to reduce documentation burdens on clinicians, with systems already deployed across millions of patient visits. While these tools offer substantial efficiency benefits and reduced clinician burnout, they pose serious risks through transcription errors and hallucinations. These risks are disproportionately placed on certain demographics of speakers, from patients with speech disorders to psychiatric illnesses. We argue for more principled audits to be conducted on medical AI scribes, analogous to post-marketing surveillance for medical devices. Our framework for doing so involves: (1) collecting diverse, medically-relevant speech datasets representative of real patient and provider populations, (2) developing metric suites that go beyond the singular gold standard of Word Error Rates, and (3) conducting human-centered design research to align functionality with the needs of both medical providers and patients.
APA
Koenecke, A., Nunez, J., Rameau, A. & Chen, I.Y.. (2026). Perspective: Listening to Users when Auditing Medical AI Scribes. Proceedings of the Fifth Machine Learning for Health Symposium, in Proceedings of Machine Learning Research 297:1619-1631 Available from https://proceedings.mlr.press/v297/koenecke26a.html.

Related Material