Analysis of Variational Bayesian Factorizations for Sparse and Low-Rank Estimation
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:926-935, 2016.
Variational Bayesian (VB) approximations anchor a wide variety of probabilistic models, where tractable posterior inference is almost never possible. Typically based on the so-called VB mean-field approximation to the Kullback-Leibler divergence, a posterior distribution is sought that factorizes across groups of latent variables such that, with the distributions of all but one group of variables held fixed, an optimal closed-form distribution can be obtained for the remaining group, with differing algorithms distinguished by how different variables are grouped and ultimately factored. This basic strategy is particularly attractive when estimating structured low-dimensional models of high-dimensional data, exemplified by the search for minimal rank and/or sparse approximations to observed data. To this end, VB models are frequently deployed across applications including multi-task learning, robust PCA, subspace clustering, matrix completion, affine rank minimization, source localization, compressive sensing, and assorted combinations thereof. Perhaps surprisingly however, there exists almost no attendant theoretical explanation for how various VB factorizations operate, and in which situations one may be preferable to another. We address this relative void by comparing arguably two of the most popular factorizations, one built upon Gaussian scale mixture priors, the other bilinear Gaussian priors, both of which can favor minimal rank or sparsity depending on the context. More specifically, by reexpressing the respective VB objective functions, we weigh multiple factors related to local minima avoidance, feature transformation invariance and correlation, and computational complexity to arrive at insightful conclusions useful in explaining performance and deciding which VB flavor is advantageous. We also envision that the principles explored here are quite relevant to other structured inverse problems where VB serves as a viable solution.