Proximity Variational Inference
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1961-1969, 2018.
Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initial- ization and optimization quirks and finds better local optima. We demonstrate our method on four proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network fit to real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014) and show that it gives better performance by reducing overpruning. PVI also yields improved predictions in a deep generative model of text. Empirically, we show that PVI consistently finds better local optima and gives better predictive performance.