Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2188-2197, 2017.

Abstract

Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: $1$) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and $2$) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this paper, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-liu17f, title = {{G}ram-{CTC}: Automatic Unit Selection and Target Decomposition for Sequence Labelling}, author = {Hairong Liu and Zhenyao Zhu and Xiangang Li and Sanjeev Satheesh}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2188--2197}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/liu17f/liu17f.pdf}, url = {https://proceedings.mlr.press/v70/liu17f.html}, abstract = {Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: $1$) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and $2$) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this paper, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.} }
Endnote
%0 Conference Paper %T Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling %A Hairong Liu %A Zhenyao Zhu %A Xiangang Li %A Sanjeev Satheesh %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-liu17f %I PMLR %P 2188--2197 %U https://proceedings.mlr.press/v70/liu17f.html %V 70 %X Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: $1$) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and $2$) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this paper, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.
APA
Liu, H., Zhu, Z., Li, X. & Satheesh, S.. (2017). Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2188-2197 Available from https://proceedings.mlr.press/v70/liu17f.html.

Related Material