Fast Decoding in Sequence Models Using Discrete Latent Variables

Lukasz Kaiser, Samy Bengio, Aurko Roy, Ashish Vaswani, Niki Parmar, Jakob Uszkoreit, Noam Shazeer
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2390-2399, 2018.

Abstract

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-kaiser18a, title = {Fast Decoding in Sequence Models Using Discrete Latent Variables}, author = {Kaiser, Lukasz and Bengio, Samy and Roy, Aurko and Vaswani, Ashish and Parmar, Niki and Uszkoreit, Jakob and Shazeer, Noam}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2390--2399}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/kaiser18a/kaiser18a.pdf}, url = {https://proceedings.mlr.press/v80/kaiser18a.html}, abstract = {Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.} }
Endnote
%0 Conference Paper %T Fast Decoding in Sequence Models Using Discrete Latent Variables %A Lukasz Kaiser %A Samy Bengio %A Aurko Roy %A Ashish Vaswani %A Niki Parmar %A Jakob Uszkoreit %A Noam Shazeer %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-kaiser18a %I PMLR %P 2390--2399 %U https://proceedings.mlr.press/v80/kaiser18a.html %V 80 %X Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.
APA
Kaiser, L., Bengio, S., Roy, A., Vaswani, A., Parmar, N., Uszkoreit, J. & Shazeer, N.. (2018). Fast Decoding in Sequence Models Using Discrete Latent Variables. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2390-2399 Available from https://proceedings.mlr.press/v80/kaiser18a.html.

Related Material