No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions

Benjamin Samuel Ruben; William Lingxiao Tong; Hamza Tahir Chaudhry; Cengiz Pehlevan

No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions

Benjamin Samuel Ruben, William Lingxiao Tong, Hamza Tahir Chaudhry, Cengiz Pehlevan

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:52198-52224, 2025.

Abstract

Given a fixed budget for total model size, one must choose between training a single large model or combining the predictions of multiple smaller models. We investigate this trade-off for ensembles of random-feature ridge regression models in both the overparameterized and underparameterized regimes. Using deterministic equivalent risk estimates, we prove that when a fixed number of parameters is distributed among $K$ independently trained models, the ridge-optimized test risk increases with $K$. Consequently, a single large model achieves optimal performance. We then ask when ensembles can achieve near-optimal performance. In the overparameterized regime, we show that, to leading order, the test error depends on ensemble size and model size only through the total feature count, so that overparameterized ensembles consistently achieve near-optimal performance. To understand underparameterized ensembles, we derive scaling laws for the test risk as a function of total parameter count when the ensemble size and parameters per ensemble member are jointly scaled according to a “growth exponent” $\ell$. While the optimal error scaling is always achieved by increasing model size with a fixed ensemble size, our analysis identifies conditions on the kernel and task eigenstructure under which near-optimal scaling laws can be obtained by joint scaling of ensemble size and model size.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-ruben25a,
  title = 	 {No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions},
  author =       {Ruben, Benjamin Samuel and Tong, William Lingxiao and Chaudhry, Hamza Tahir and Pehlevan, Cengiz},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {52198--52224},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ruben25a/ruben25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/ruben25a.html},
  abstract = 	 {Given a fixed budget for total model size, one must choose between training a single large model or combining the predictions of multiple smaller models. We investigate this trade-off for ensembles of random-feature ridge regression models in both the overparameterized and underparameterized regimes. Using deterministic equivalent risk estimates, we prove that when a fixed number of parameters is distributed among $K$ independently trained models, the ridge-optimized test risk increases with $K$. Consequently, a single large model achieves optimal performance. We then ask when ensembles can achieve near-optimal performance. In the overparameterized regime, we show that, to leading order, the test error depends on ensemble size and model size only through the total feature count, so that overparameterized ensembles consistently achieve near-optimal performance. To understand underparameterized ensembles, we derive scaling laws for the test risk as a function of total parameter count when the ensemble size and parameters per ensemble member are jointly scaled according to a “growth exponent” $\ell$. While the optimal error scaling is always achieved by increasing model size with a fixed ensemble size, our analysis identifies conditions on the kernel and task eigenstructure under which near-optimal scaling laws can be obtained by joint scaling of ensemble size and model size.}
}

Endnote

%0 Conference Paper
%T No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions
%A Benjamin Samuel Ruben
%A William Lingxiao Tong
%A Hamza Tahir Chaudhry
%A Cengiz Pehlevan
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-ruben25a
%I PMLR
%P 52198--52224
%U https://proceedings.mlr.press/v267/ruben25a.html
%V 267
%X Given a fixed budget for total model size, one must choose between training a single large model or combining the predictions of multiple smaller models. We investigate this trade-off for ensembles of random-feature ridge regression models in both the overparameterized and underparameterized regimes. Using deterministic equivalent risk estimates, we prove that when a fixed number of parameters is distributed among $K$ independently trained models, the ridge-optimized test risk increases with $K$. Consequently, a single large model achieves optimal performance. We then ask when ensembles can achieve near-optimal performance. In the overparameterized regime, we show that, to leading order, the test error depends on ensemble size and model size only through the total feature count, so that overparameterized ensembles consistently achieve near-optimal performance. To understand underparameterized ensembles, we derive scaling laws for the test risk as a function of total parameter count when the ensemble size and parameters per ensemble member are jointly scaled according to a “growth exponent” $\ell$. While the optimal error scaling is always achieved by increasing model size with a fixed ensemble size, our analysis identifies conditions on the kernel and task eigenstructure under which near-optimal scaling laws can be obtained by joint scaling of ensemble size and model size.

APA

Ruben, B.S., Tong, W.L., Chaudhry, H.T. & Pehlevan, C.. (2025). No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:52198-52224 Available from https://proceedings.mlr.press/v267/ruben25a.html.

No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions

Abstract

Cite this Paper

Related Material