Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2100-2110, 2020.

Abstract

We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-collobert20a, title = {Word-Level Speech Recognition With a Letter to Word Encoder}, author = {Collobert, Ronan and Hannun, Awni and Synnaeve, Gabriel}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2100--2110}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/collobert20a/collobert20a.pdf}, url = { http://proceedings.mlr.press/v119/collobert20a.html }, abstract = {We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.} }
Endnote
%0 Conference Paper %T Word-Level Speech Recognition With a Letter to Word Encoder %A Ronan Collobert %A Awni Hannun %A Gabriel Synnaeve %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-collobert20a %I PMLR %P 2100--2110 %U http://proceedings.mlr.press/v119/collobert20a.html %V 119 %X We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.
APA
Collobert, R., Hannun, A. & Synnaeve, G.. (2020). Word-Level Speech Recognition With a Letter to Word Encoder. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2100-2110 Available from http://proceedings.mlr.press/v119/collobert20a.html .

Related Material