The LORACs Prior for VAEs: Letting the Trees Speak for the Data

Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:3292-3301, 2019.

Abstract

In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The “default” prior is a standard normal, but if the natural factors of variation in the dataset exhibit discrete structure or are not independent, then the isotropic-normal prior will actually encourage learning representations that \emph{mask} this structure. To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). To scale learning to large datasets, we develop a new inducing-point approximation and inference algorithm. We then apply the method without supervision to several datasets and examine the interpretability and practical performance of the inferred hierarchies and learned latent space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-vikram19a, title = {The LORACs Prior for VAEs: Letting the Trees Speak for the Data}, author = {Vikram, Sharad and Hoffman, Matthew D. and Johnson, Matthew J.}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {3292--3301}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/vikram19a/vikram19a.pdf}, url = {https://proceedings.mlr.press/v89/vikram19a.html}, abstract = {In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The “default” prior is a standard normal, but if the natural factors of variation in the dataset exhibit discrete structure or are not independent, then the isotropic-normal prior will actually encourage learning representations that \emph{mask} this structure. To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). To scale learning to large datasets, we develop a new inducing-point approximation and inference algorithm. We then apply the method without supervision to several datasets and examine the interpretability and practical performance of the inferred hierarchies and learned latent space.} }
Endnote
%0 Conference Paper %T The LORACs Prior for VAEs: Letting the Trees Speak for the Data %A Sharad Vikram %A Matthew D. Hoffman %A Matthew J. Johnson %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-vikram19a %I PMLR %P 3292--3301 %U https://proceedings.mlr.press/v89/vikram19a.html %V 89 %X In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The “default” prior is a standard normal, but if the natural factors of variation in the dataset exhibit discrete structure or are not independent, then the isotropic-normal prior will actually encourage learning representations that \emph{mask} this structure. To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). To scale learning to large datasets, we develop a new inducing-point approximation and inference algorithm. We then apply the method without supervision to several datasets and examine the interpretability and practical performance of the inferred hierarchies and learned latent space.
APA
Vikram, S., Hoffman, M.D. & Johnson, M.J.. (2019). The LORACs Prior for VAEs: Letting the Trees Speak for the Data. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:3292-3301 Available from https://proceedings.mlr.press/v89/vikram19a.html.

Related Material