Massively Parallel Expectation Maximization For Approximate Posteriors

Thomas Heap, Sam Bowyer, Laurence Aitchison
Proceedings of the 7th Symposium on Advances in Approximate Bayesian Inference, PMLR 289:25-66, 2025.

Abstract

Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance weighting methods (Bowyer et al., 2024) give fast and accurate posterior moment estimates, and we can use these moment estimates to rapidly learn an approximate posterior. Specifically, we propose using expectation maximization to fit the approximate posterior, which we call QEM. The expectation step involves computing the posterior moments using high-quality massively parallel estimates from Bowyer et al. (2024). The maximization step involves fitting the approximate posterior using these moments, which can be done straightforwardly for simple approximate posteriors such as Gaussian, Gamma, Beta, Dirichlet, Binomial, Multinomial, Categorical, etc. (or combinations thereof). We show that QEM is faster than state-of-the-art, massively parallel variants of RWS and VI, and is invariant to reparameterizations of the model that dramatically slow down gradient based methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v289-heap25a, title = {Massively Parallel Expectation Maximization For Approximate Posteriors}, author = {Heap, Thomas and Bowyer, Sam and Aitchison, Laurence}, booktitle = {Proceedings of the 7th Symposium on Advances in Approximate Bayesian Inference}, pages = {25--66}, year = {2025}, editor = {Allingham, James Urquhart and Swaroop, Siddharth}, volume = {289}, series = {Proceedings of Machine Learning Research}, month = {29 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v289/main/assets/heap25a/heap25a.pdf}, url = {https://proceedings.mlr.press/v289/heap25a.html}, abstract = {Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance weighting methods (Bowyer et al., 2024) give fast and accurate posterior moment estimates, and we can use these moment estimates to rapidly learn an approximate posterior. Specifically, we propose using expectation maximization to fit the approximate posterior, which we call QEM. The expectation step involves computing the posterior moments using high-quality massively parallel estimates from Bowyer et al. (2024). The maximization step involves fitting the approximate posterior using these moments, which can be done straightforwardly for simple approximate posteriors such as Gaussian, Gamma, Beta, Dirichlet, Binomial, Multinomial, Categorical, etc. (or combinations thereof). We show that QEM is faster than state-of-the-art, massively parallel variants of RWS and VI, and is invariant to reparameterizations of the model that dramatically slow down gradient based methods.} }
Endnote
%0 Conference Paper %T Massively Parallel Expectation Maximization For Approximate Posteriors %A Thomas Heap %A Sam Bowyer %A Laurence Aitchison %B Proceedings of the 7th Symposium on Advances in Approximate Bayesian Inference %C Proceedings of Machine Learning Research %D 2025 %E James Urquhart Allingham %E Siddharth Swaroop %F pmlr-v289-heap25a %I PMLR %P 25--66 %U https://proceedings.mlr.press/v289/heap25a.html %V 289 %X Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance weighting methods (Bowyer et al., 2024) give fast and accurate posterior moment estimates, and we can use these moment estimates to rapidly learn an approximate posterior. Specifically, we propose using expectation maximization to fit the approximate posterior, which we call QEM. The expectation step involves computing the posterior moments using high-quality massively parallel estimates from Bowyer et al. (2024). The maximization step involves fitting the approximate posterior using these moments, which can be done straightforwardly for simple approximate posteriors such as Gaussian, Gamma, Beta, Dirichlet, Binomial, Multinomial, Categorical, etc. (or combinations thereof). We show that QEM is faster than state-of-the-art, massively parallel variants of RWS and VI, and is invariant to reparameterizations of the model that dramatically slow down gradient based methods.
APA
Heap, T., Bowyer, S. & Aitchison, L.. (2025). Massively Parallel Expectation Maximization For Approximate Posteriors. Proceedings of the 7th Symposium on Advances in Approximate Bayesian Inference, in Proceedings of Machine Learning Research 289:25-66 Available from https://proceedings.mlr.press/v289/heap25a.html.

Related Material