SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Yuhta Takida, Takashi Shibuya, Weihsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20987-21012, 2022.

Abstract

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-takida22a, title = {{SQ}-{VAE}: Variational {B}ayes on Discrete Representation with Self-annealed Stochastic Quantization}, author = {Takida, Yuhta and Shibuya, Takashi and Liao, Weihsiang and Lai, Chieh-Hsin and Ohmura, Junki and Uesaka, Toshimitsu and Murata, Naoki and Takahashi, Shusuke and Kumakura, Toshiyuki and Mitsufuji, Yuki}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {20987--21012}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/takida22a/takida22a.pdf}, url = {https://proceedings.mlr.press/v162/takida22a.html}, abstract = {One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.} }
Endnote
%0 Conference Paper %T SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization %A Yuhta Takida %A Takashi Shibuya %A Weihsiang Liao %A Chieh-Hsin Lai %A Junki Ohmura %A Toshimitsu Uesaka %A Naoki Murata %A Shusuke Takahashi %A Toshiyuki Kumakura %A Yuki Mitsufuji %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-takida22a %I PMLR %P 20987--21012 %U https://proceedings.mlr.press/v162/takida22a.html %V 162 %X One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.
APA
Takida, Y., Shibuya, T., Liao, W., Lai, C., Ohmura, J., Uesaka, T., Murata, N., Takahashi, S., Kumakura, T. & Mitsufuji, Y.. (2022). SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20987-21012 Available from https://proceedings.mlr.press/v162/takida22a.html.

Related Material