Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:704-712, 2016.
Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature parameter that deterministically deforms the objective and reduces this parameter over the course of the optimization. A well-known drawback in annealing is the choice of the annealing schedule. We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model. In contrast to related work in the Markov chain Monte Carlo literature, this algorithm results in adaptive annealing schedules. Lastly, we develop local variational tempering, which assigns a latent temperature to each data point; this allows for dynamic annealing that varies across data. Compared to the traditional VI, all proposed approaches find improved predictive likelihoods on held-out data.