A fully differentiable beam search decoder

Ronan Collobert; Awni Hannun; Gabriel Synnaeve

A fully differentiable beam search decoder

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1341-1350, 2019.

Abstract

We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms can successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-collobert19a,
  title = 	 {A fully differentiable beam search decoder},
  author =       {Collobert, Ronan and Hannun, Awni and Synnaeve, Gabriel},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {1341--1350},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/collobert19a/collobert19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/collobert19a.html},
  abstract = 	 {We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms can successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.}
}

Endnote

%0 Conference Paper
%T A fully differentiable beam search decoder
%A Ronan Collobert
%A Awni Hannun
%A Gabriel Synnaeve
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-collobert19a
%I PMLR
%P 1341--1350
%U https://proceedings.mlr.press/v97/collobert19a.html
%V 97
%X We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms can successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.

APA

Collobert, R., Hannun, A. & Synnaeve, G.. (2019). A fully differentiable beam search decoder. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1341-1350 Available from https://proceedings.mlr.press/v97/collobert19a.html.

Related Material

Download PDF