Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim; Yian Ma; Jacob Gardner

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim, Yian Ma, Jacob Gardner

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:235-243, 2024.

Abstract

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called “linear”) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $\theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

Cite this Paper

BibTeX


@InProceedings{pmlr-v238-kim24a,
  title = 	 {Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?},
  author =       {Kim, Kyurae and Ma, Yian and Gardner, Jacob},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {235--243},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/kim24a/kim24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/kim24a.html},
  abstract = 	 {We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called “linear”) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $\theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.}
}

Endnote

%0 Conference Paper
%T Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?
%A Kyurae Kim
%A Yian Ma
%A Jacob Gardner
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-kim24a
%I PMLR
%P 235--243
%U https://proceedings.mlr.press/v238/kim24a.html
%V 238
%X We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called “linear”) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $\theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

APA


Kim, K., Ma, Y. & Gardner, J.. (2024). Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:235-243 Available from https://proceedings.mlr.press/v238/kim24a.html.

Related Material

Download PDF