Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification

Bo Pang, Ying Nian Wu
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8359-8370, 2021.

Abstract

We propose a latent space energy-based prior model for text generation and classification. The model stands on a generator network that generates the text sequence based on a continuous latent vector. The energy term of the prior model couples a continuous latent vector and a symbolic one-hot vector, so that discrete category can be inferred from the observed example based on the continuous latent vector. Such a latent space coupling naturally enables incorporation of information bottleneck regularization to encourage the continuous latent vector to extract information from the observed example that is informative of the underlying category. In our learning method, the symbol-vector coupling, the generator network and the inference network are learned jointly. Our model can be learned in an unsupervised setting where no category labels are provided. It can also be learned in semi-supervised setting where category labels are provided for a subset of training examples. Our experiments demonstrate that the proposed model learns well-structured and meaningful latent space, which (1) guides the generator to generate text with high quality, diversity, and interpretability, and (2) effectively classifies text.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-pang21a, title = {Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification}, author = {Pang, Bo and Wu, Ying Nian}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8359--8370}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/pang21a/pang21a.pdf}, url = {https://proceedings.mlr.press/v139/pang21a.html}, abstract = {We propose a latent space energy-based prior model for text generation and classification. The model stands on a generator network that generates the text sequence based on a continuous latent vector. The energy term of the prior model couples a continuous latent vector and a symbolic one-hot vector, so that discrete category can be inferred from the observed example based on the continuous latent vector. Such a latent space coupling naturally enables incorporation of information bottleneck regularization to encourage the continuous latent vector to extract information from the observed example that is informative of the underlying category. In our learning method, the symbol-vector coupling, the generator network and the inference network are learned jointly. Our model can be learned in an unsupervised setting where no category labels are provided. It can also be learned in semi-supervised setting where category labels are provided for a subset of training examples. Our experiments demonstrate that the proposed model learns well-structured and meaningful latent space, which (1) guides the generator to generate text with high quality, diversity, and interpretability, and (2) effectively classifies text.} }
Endnote
%0 Conference Paper %T Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification %A Bo Pang %A Ying Nian Wu %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-pang21a %I PMLR %P 8359--8370 %U https://proceedings.mlr.press/v139/pang21a.html %V 139 %X We propose a latent space energy-based prior model for text generation and classification. The model stands on a generator network that generates the text sequence based on a continuous latent vector. The energy term of the prior model couples a continuous latent vector and a symbolic one-hot vector, so that discrete category can be inferred from the observed example based on the continuous latent vector. Such a latent space coupling naturally enables incorporation of information bottleneck regularization to encourage the continuous latent vector to extract information from the observed example that is informative of the underlying category. In our learning method, the symbol-vector coupling, the generator network and the inference network are learned jointly. Our model can be learned in an unsupervised setting where no category labels are provided. It can also be learned in semi-supervised setting where category labels are provided for a subset of training examples. Our experiments demonstrate that the proposed model learns well-structured and meaningful latent space, which (1) guides the generator to generate text with high quality, diversity, and interpretability, and (2) effectively classifies text.
APA
Pang, B. & Wu, Y.N.. (2021). Latent Space Energy-Based Model of Symbol-Vector Coupling for Text Generation and Classification. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8359-8370 Available from https://proceedings.mlr.press/v139/pang21a.html.

Related Material