SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Yuhta Takida; Takashi Shibuya; Weihsiang Liao; Chieh-Hsin Lai; Junki Ohmura; Toshimitsu Uesaka; Naoki Murata; Shusuke Takahashi; Toshiyuki Kumakura; Yuki Mitsufuji

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Yuhta Takida, Takashi Shibuya, Weihsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20987-21012, 2022.

Abstract

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-takida22a,
  title = 	 {{SQ}-{VAE}: Variational {B}ayes on Discrete Representation with Self-annealed Stochastic Quantization},
  author =       {Takida, Yuhta and Shibuya, Takashi and Liao, Weihsiang and Lai, Chieh-Hsin and Ohmura, Junki and Uesaka, Toshimitsu and Murata, Naoki and Takahashi, Shusuke and Kumakura, Toshiyuki and Mitsufuji, Yuki},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {20987--21012},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/takida22a/takida22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/takida22a.html},
  abstract = 	 {One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.}
}

Endnote

%0 Conference Paper
%T SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
%A Yuhta Takida
%A Takashi Shibuya
%A Weihsiang Liao
%A Chieh-Hsin Lai
%A Junki Ohmura
%A Toshimitsu Uesaka
%A Naoki Murata
%A Shusuke Takahashi
%A Toshiyuki Kumakura
%A Yuki Mitsufuji
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-takida22a
%I PMLR
%P 20987--21012
%U https://proceedings.mlr.press/v162/takida22a.html
%V 162
%X One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

APA


Takida, Y., Shibuya, T., Liao, W., Lai, C., Ohmura, J., Uesaka, T., Murata, N., Takahashi, S., Kumakura, T. & Mitsufuji, Y.. (2022). SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:20987-21012 Available from https://proceedings.mlr.press/v162/takida22a.html.

Related Material

Download PDF