Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert; Awni Hannun; Gabriel Synnaeve

Word-Level Speech Recognition With a Letter to Word Encoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2100-2110, 2020.

Abstract

We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-collobert20a,
  title = 	 {Word-Level Speech Recognition With a Letter to Word Encoder},
  author =       {Collobert, Ronan and Hannun, Awni and Synnaeve, Gabriel},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {2100--2110},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/collobert20a/collobert20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/collobert20a.html},
  abstract = 	 {We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.}
}

Endnote

%0 Conference Paper
%T Word-Level Speech Recognition With a Letter to Word Encoder
%A Ronan Collobert
%A Awni Hannun
%A Gabriel Synnaeve
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-collobert20a
%I PMLR
%P 2100--2110
%U https://proceedings.mlr.press/v119/collobert20a.html
%V 119
%X We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

APA

Collobert, R., Hannun, A. & Synnaeve, G.. (2020). Word-Level Speech Recognition With a Letter to Word Encoder. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2100-2110 Available from https://proceedings.mlr.press/v119/collobert20a.html.

Word-Level Speech Recognition With a Letter to Word Encoder

Abstract

Cite this Paper

Related Material