Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)

Fares Ernez, Alexandre Arnold, Audrey Galametz, Catherine Kobus, Nawal Ould-Amer
Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 204:16-35, 2023.

Abstract

Uncertainty quantification is critical when using Automatic Speech Recognition (ASR) in High Risk Systems where safety is highly important. While developing ASR models adapted to such context, a range of techniques are being explored to measure the uncertainty of their predictions. In this paper, we present two algorithms: the first one applies the Conformal Risk Control paradigm to predict a set of sentences that controls the Word Error Rate (WER) to an adjustable level of guarantee. The second algorithm uses Inductive Conformal Prediction (ICP) to predict uncertain words in an automatic transcription. We analyze the performance of the three algorithms using an open-source ASR model based on Wav2vec 2.0. The CP algorithms were trained on the “clean test” part of the LibriSpeech corpus that contains approximately 2,600 sentences. The results show that the three algorithms provide valid and efficient prediction sets. We guarantee that the WER is below 2% with a confidence level of 80% and an average set size of 29 sentences and we detect 90% of the badly transcripted words.

Cite this Paper


BibTeX
@InProceedings{pmlr-v204-ernez23a, title = {Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0)}, author = {Ernez, Fares and Arnold, Alexandre and Galametz, Audrey and Kobus, Catherine and Ould-Amer, Nawal}, booktitle = {Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications}, pages = {16--35}, year = {2023}, editor = {Papadopoulos, Harris and Nguyen, Khuong An and Boström, Henrik and Carlsson, Lars}, volume = {204}, series = {Proceedings of Machine Learning Research}, month = {13--15 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v204/ernez23a/ernez23a.pdf}, url = {https://proceedings.mlr.press/v204/ernez23a.html}, abstract = {Uncertainty quantification is critical when using Automatic Speech Recognition (ASR) in High Risk Systems where safety is highly important. While developing ASR models adapted to such context, a range of techniques are being explored to measure the uncertainty of their predictions. In this paper, we present two algorithms: the first one applies the Conformal Risk Control paradigm to predict a set of sentences that controls the Word Error Rate (WER) to an adjustable level of guarantee. The second algorithm uses Inductive Conformal Prediction (ICP) to predict uncertain words in an automatic transcription. We analyze the performance of the three algorithms using an open-source ASR model based on Wav2vec 2.0. The CP algorithms were trained on the “clean test” part of the LibriSpeech corpus that contains approximately 2,600 sentences. The results show that the three algorithms provide valid and efficient prediction sets. We guarantee that the WER is below 2% with a confidence level of 80% and an average set size of 29 sentences and we detect 90% of the badly transcripted words.} }
Endnote
%0 Conference Paper %T Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0) %A Fares Ernez %A Alexandre Arnold %A Audrey Galametz %A Catherine Kobus %A Nawal Ould-Amer %B Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications %C Proceedings of Machine Learning Research %D 2023 %E Harris Papadopoulos %E Khuong An Nguyen %E Henrik Boström %E Lars Carlsson %F pmlr-v204-ernez23a %I PMLR %P 16--35 %U https://proceedings.mlr.press/v204/ernez23a.html %V 204 %X Uncertainty quantification is critical when using Automatic Speech Recognition (ASR) in High Risk Systems where safety is highly important. While developing ASR models adapted to such context, a range of techniques are being explored to measure the uncertainty of their predictions. In this paper, we present two algorithms: the first one applies the Conformal Risk Control paradigm to predict a set of sentences that controls the Word Error Rate (WER) to an adjustable level of guarantee. The second algorithm uses Inductive Conformal Prediction (ICP) to predict uncertain words in an automatic transcription. We analyze the performance of the three algorithms using an open-source ASR model based on Wav2vec 2.0. The CP algorithms were trained on the “clean test” part of the LibriSpeech corpus that contains approximately 2,600 sentences. The results show that the three algorithms provide valid and efficient prediction sets. We guarantee that the WER is below 2% with a confidence level of 80% and an average set size of 29 sentences and we detect 90% of the badly transcripted words.
APA
Ernez, F., Arnold, A., Galametz, A., Kobus, C. & Ould-Amer, N.. (2023). Applying the conformal prediction paradigm for the uncertainty quantification of an end-to-end automatic speech recognition model (wav2vec 2.0). Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications, in Proceedings of Machine Learning Research 204:16-35 Available from https://proceedings.mlr.press/v204/ernez23a.html.

Related Material