[edit]
Detecting Whisper Hallucinations with Local Confidence Contrasts
Proceedings of the Fourth Swiss AI Days, PMLR 309:38-45, 2026.
Abstract
Automatic speech recognition has advanced significantly with models like Whisper, yet confident hallucinations remain a critical challenge. In this work, we propose a lightweight and interpretable error detection framework that augments acoustic confidence with explicit contextual features. We introduce the Local Confidence Drop, a novel metric designed to capture sudden stability dips between neighboring tokens. Evaluated on the FLEURS dataset, our fandom forest classifier achieves 0.64 AP, consistently outperforming the baseline (p < 0.001). Crucially, we demonstrate that hallucinations manifest as local contextual discontinuities, providing a transparent alternative to opaque neural post-processors.