Universal priors: solving empirical Bayes via Bayesian inference and pretraining

Nick Cannella, Anzo Teh, Yanjun Han, Yury Polyanskiy
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:896-937, 2026.

Abstract

We theoretically justify the recent empirical finding of Teh et al. (2025) that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained Bayes estimator adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a fractional posterior.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-cannella26a, title = {Universal priors: solving empirical Bayes via Bayesian inference and pretraining}, author = {Cannella, Nick and Teh, Anzo and Han, Yanjun and Polyanskiy, Yury}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {896--937}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/cannella26a/cannella26a.pdf}, url = {https://proceedings.mlr.press/v336/cannella26a.html}, abstract = {We theoretically justify the recent empirical finding of Teh et al. (2025) that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained Bayes estimator adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a fractional posterior.} }
Endnote
%0 Conference Paper %T Universal priors: solving empirical Bayes via Bayesian inference and pretraining %A Nick Cannella %A Anzo Teh %A Yanjun Han %A Yury Polyanskiy %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-cannella26a %I PMLR %P 896--937 %U https://proceedings.mlr.press/v336/cannella26a.html %V 336 %X We theoretically justify the recent empirical finding of Teh et al. (2025) that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained Bayes estimator adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a fractional posterior.
APA
Cannella, N., Teh, A., Han, Y. & Polyanskiy, Y.. (2026). Universal priors: solving empirical Bayes via Bayesian inference and pretraining. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:896-937 Available from https://proceedings.mlr.press/v336/cannella26a.html.

Related Material