Fixing a Broken ELBO

Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A. Saurous, Kevin Murphy
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:159-168, 2018.

Abstract

Recent work in unsupervised representation learning has focused on learning deep directed latentvariable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-alemi18a, title = {Fixing a Broken {ELBO}}, author = {Alemi, Alexander and Poole, Ben and Fischer, Ian and Dillon, Joshua and Saurous, Rif A. and Murphy, Kevin}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {159--168}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/alemi18a/alemi18a.pdf}, url = {https://proceedings.mlr.press/v80/alemi18a.html}, abstract = {Recent work in unsupervised representation learning has focused on learning deep directed latentvariable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.} }
Endnote
%0 Conference Paper %T Fixing a Broken ELBO %A Alexander Alemi %A Ben Poole %A Ian Fischer %A Joshua Dillon %A Rif A. Saurous %A Kevin Murphy %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-alemi18a %I PMLR %P 159--168 %U https://proceedings.mlr.press/v80/alemi18a.html %V 80 %X Recent work in unsupervised representation learning has focused on learning deep directed latentvariable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.
APA
Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R.A. & Murphy, K.. (2018). Fixing a Broken ELBO. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:159-168 Available from https://proceedings.mlr.press/v80/alemi18a.html.

Related Material