A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints

Michael Munn, Susan Wei
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45256-45271, 2025.

Abstract

Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that enhance downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint’s adaptability by measuring the concentration of nearby favorable parameters for a downstream task. We demonstrate that this Bayesian model selection criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-munn25a, title = {A {B}ayesian Model Selection Criterion for Selecting Pretraining Checkpoints}, author = {Munn, Michael and Wei, Susan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {45256--45271}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/munn25a/munn25a.pdf}, url = {https://proceedings.mlr.press/v267/munn25a.html}, abstract = {Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that enhance downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint’s adaptability by measuring the concentration of nearby favorable parameters for a downstream task. We demonstrate that this Bayesian model selection criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability.} }
Endnote
%0 Conference Paper %T A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints %A Michael Munn %A Susan Wei %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-munn25a %I PMLR %P 45256--45271 %U https://proceedings.mlr.press/v267/munn25a.html %V 267 %X Recent advances in artificial intelligence have been fueled by the development of foundation models such as BERT, GPT, T5, and Vision Transformers. These models are first pretrained on vast and diverse datasets and then adapted to specific downstream tasks, often with significantly less data. However, the mechanisms behind the success of this ubiquitous pretrain-then-adapt paradigm remain underexplored, particularly the characteristics of pretraining checkpoints that enhance downstream adaptation. We introduce a Bayesian model selection criterion, called the downstream free energy, which quantifies a checkpoint’s adaptability by measuring the concentration of nearby favorable parameters for a downstream task. We demonstrate that this Bayesian model selection criterion can be effectively implemented without access to the downstream data or prior knowledge of the downstream task. Furthermore, we provide empirical evidence that the criterion reliably correlates with improved fine-tuning performance, offering a principled approach to predicting model adaptability.
APA
Munn, M. & Wei, S.. (2025). A Bayesian Model Selection Criterion for Selecting Pretraining Checkpoints. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:45256-45271 Available from https://proceedings.mlr.press/v267/munn25a.html.

Related Material