[edit]
Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)
Proceedings of the Twelfth Symposium on Conformal
and Probabilistic Prediction with Applications, PMLR 204:16-35, 2023.
Abstract
Uncertainty quantification is critical when using
Automatic Speech Recognition (ASR) in High Risk
Systems where safety is highly important. While
developing ASR models adapted to such context, a
range of techniques are being explored to measure
the uncertainty of their predictions. In this
paper, we present two algorithms: the first one
applies the Conformal Risk Control paradigm to
predict a set of sentences that controls the Word
Error Rate (WER) to an adjustable level of
guarantee. The second algorithm uses Inductive
Conformal Prediction (ICP) to predict uncertain
words in an automatic transcription. We analyze the
performance of the three algorithms using an
open-source ASR model based on Wav2vec 2.0. The CP
algorithms were trained on the “clean test” part of
the LibriSpeech corpus that contains approximately
2,600 sentences. The results show that the three
algorithms provide valid and efficient prediction
sets. We guarantee that the WER is below 2% with a
confidence level of 80% and an average set size of
29 sentences and we detect 90% of the badly
transcripted words.