Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3881-3890, 2017.

Abstract

Recent work on generative text modeling has found that variational autoencoders (VAE) with LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s dilation architecture, we control the size of context from previously generated words. In experiments, we find that there is a trade-off between contextual capacity of the decoder and effective use of encoding information. We show that when carefully managed, VAEs can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive language modeling result with VAE. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-yang17d, title = {Improved Variational Autoencoders for Text Modeling using Dilated Convolutions}, author = {Zichao Yang and Zhiting Hu and Ruslan Salakhutdinov and Taylor Berg-Kirkpatrick}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3881--3890}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/yang17d/yang17d.pdf}, url = {https://proceedings.mlr.press/v70/yang17d.html}, abstract = {Recent work on generative text modeling has found that variational autoencoders (VAE) with LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s dilation architecture, we control the size of context from previously generated words. In experiments, we find that there is a trade-off between contextual capacity of the decoder and effective use of encoding information. We show that when carefully managed, VAEs can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive language modeling result with VAE. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.} }
Endnote
%0 Conference Paper %T Improved Variational Autoencoders for Text Modeling using Dilated Convolutions %A Zichao Yang %A Zhiting Hu %A Ruslan Salakhutdinov %A Taylor Berg-Kirkpatrick %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-yang17d %I PMLR %P 3881--3890 %U https://proceedings.mlr.press/v70/yang17d.html %V 70 %X Recent work on generative text modeling has found that variational autoencoders (VAE) with LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder’s dilation architecture, we control the size of context from previously generated words. In experiments, we find that there is a trade-off between contextual capacity of the decoder and effective use of encoding information. We show that when carefully managed, VAEs can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive language modeling result with VAE. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.
APA
Yang, Z., Hu, Z., Salakhutdinov, R. & Berg-Kirkpatrick, T.. (2017). Improved Variational Autoencoders for Text Modeling using Dilated Convolutions. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3881-3890 Available from https://proceedings.mlr.press/v70/yang17d.html.

Related Material