Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33243-33275, 2025.

Abstract

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lee25q, title = {{B}ayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks}, author = {Lee, Dongwoo and Lee, Dong Bok and Adriaensen, Steven and Lee, Juho and Hwang, Sung Ju and Hutter, Frank and Kim, Seon Joo and Lee, Hae Beom}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {33243--33275}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lee25q/lee25q.pdf}, url = {https://proceedings.mlr.press/v267/lee25q.html}, abstract = {Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.} }
Endnote
%0 Conference Paper %T Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks %A Dongwoo Lee %A Dong Bok Lee %A Steven Adriaensen %A Juho Lee %A Sung Ju Hwang %A Frank Hutter %A Seon Joo Kim %A Hae Beom Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lee25q %I PMLR %P 33243--33275 %U https://proceedings.mlr.press/v267/lee25q.html %V 267 %X Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.
APA
Lee, D., Lee, D.B., Adriaensen, S., Lee, J., Hwang, S.J., Hutter, F., Kim, S.J. & Lee, H.B.. (2025). Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:33243-33275 Available from https://proceedings.mlr.press/v267/lee25q.html.

Related Material