posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms

Måns Magnusson, Jakob Torgander, Paul-Christian Bürkner, Lu Zhang, Bob Carpenter, Aki Vehtari
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1198-1206, 2025.

Abstract

The general applicability and robustness of posterior inference algorithms is critical to widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem is evaluating its accuracy and efficiency across a range of representative target posteriors. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for algorithm evaluation and comparison. To provide a wide range of realistic posteriors, posteriordb currently comprises 120 representative models with data, and has been instrumental in developing several inference algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-magnusson25a, title = {posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms}, author = {Magnusson, M{\aa}ns and Torgander, Jakob and B{\"u}rkner, Paul-Christian and Zhang, Lu and Carpenter, Bob and Vehtari, Aki}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1198--1206}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/magnusson25a/magnusson25a.pdf}, url = {https://proceedings.mlr.press/v258/magnusson25a.html}, abstract = {The general applicability and robustness of posterior inference algorithms is critical to widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem is evaluating its accuracy and efficiency across a range of representative target posteriors. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for algorithm evaluation and comparison. To provide a wide range of realistic posteriors, posteriordb currently comprises 120 representative models with data, and has been instrumental in developing several inference algorithms.} }
Endnote
%0 Conference Paper %T posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms %A Måns Magnusson %A Jakob Torgander %A Paul-Christian Bürkner %A Lu Zhang %A Bob Carpenter %A Aki Vehtari %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-magnusson25a %I PMLR %P 1198--1206 %U https://proceedings.mlr.press/v258/magnusson25a.html %V 258 %X The general applicability and robustness of posterior inference algorithms is critical to widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem is evaluating its accuracy and efficiency across a range of representative target posteriors. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for algorithm evaluation and comparison. To provide a wide range of realistic posteriors, posteriordb currently comprises 120 representative models with data, and has been instrumental in developing several inference algorithms.
APA
Magnusson, M., Torgander, J., Bürkner, P., Zhang, L., Carpenter, B. & Vehtari, A.. (2025). posteriordb: Testing, Benchmarking and Developing Bayesian Inference Algorithms. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1198-1206 Available from https://proceedings.mlr.press/v258/magnusson25a.html.

Related Material