Rate-Regularization and Generalization in Variational Autoencoders

Alican Bozkurt, Babak Esmaeili, Jean-Baptiste Tristan, Dana Brooks, Jennifer Dy, Jan-Willem van de Meent
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3880-3888, 2021.

Abstract

Variational autoencoders (VAEs) optimize an objective that comprises a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate term also improves generalization. We perform rate-distortion analyses in which we control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Lowering the strength of the rate term paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization performance continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard spherical Gaussian prior is not an inductive bias that typically improves generalization, prompting further work to understand what choices of priors improve generalization in VAEs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-bozkurt21a, title = { Rate-Regularization and Generalization in Variational Autoencoders }, author = {Bozkurt, Alican and Esmaeili, Babak and Tristan, Jean-Baptiste and Brooks, Dana and Dy, Jennifer and van de Meent, Jan-Willem}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {3880--3888}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/bozkurt21a/bozkurt21a.pdf}, url = {http://proceedings.mlr.press/v130/bozkurt21a.html}, abstract = { Variational autoencoders (VAEs) optimize an objective that comprises a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate term also improves generalization. We perform rate-distortion analyses in which we control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Lowering the strength of the rate term paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization performance continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard spherical Gaussian prior is not an inductive bias that typically improves generalization, prompting further work to understand what choices of priors improve generalization in VAEs. } }
Endnote
%0 Conference Paper %T Rate-Regularization and Generalization in Variational Autoencoders %A Alican Bozkurt %A Babak Esmaeili %A Jean-Baptiste Tristan %A Dana Brooks %A Jennifer Dy %A Jan-Willem van de Meent %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-bozkurt21a %I PMLR %P 3880--3888 %U http://proceedings.mlr.press/v130/bozkurt21a.html %V 130 %X Variational autoencoders (VAEs) optimize an objective that comprises a reconstruction loss (the distortion) and a KL term (the rate). The rate is an upper bound on the mutual information, which is often interpreted as a regularizer that controls the degree of compression. We here examine whether inclusion of the rate term also improves generalization. We perform rate-distortion analyses in which we control the strength of the rate term, the network capacity, and the difficulty of the generalization problem. Lowering the strength of the rate term paradoxically improves generalization in most settings, and reducing the mutual information typically leads to underfitting. Moreover, we show that generalization performance continues to improve even after the mutual information saturates, indicating that the gap on the bound (i.e. the KL divergence relative to the inference marginal) affects generalization. This suggests that the standard spherical Gaussian prior is not an inductive bias that typically improves generalization, prompting further work to understand what choices of priors improve generalization in VAEs.
APA
Bozkurt, A., Esmaeili, B., Tristan, J., Brooks, D., Dy, J. & van de Meent, J.. (2021). Rate-Regularization and Generalization in Variational Autoencoders . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:3880-3888 Available from http://proceedings.mlr.press/v130/bozkurt21a.html.

Related Material