Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Etienne Boursier; Nicolas Flammarion

Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Etienne Boursier, Nicolas Flammarion

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:5241-5275, 2025.

Abstract

Understanding generalization of overparametrized models remains a fundamental challenge in machine learning. The literature mostly studies generalization from an interpolation point of view, taking convergence towards a global minimum of the training loss for granted. This interpolation paradigm does not seem valid for complex tasks such as in-context learning or diffusion. It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks might converge towards simpler solutions rather than interpolating training data, which leads to a drastic improvement on the test loss. Our analysis relies on the so called early alignment phase, during which neurons align toward specific directions. This directional alignment leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-boursier25a,
  title = 	 {Simplicity Bias and Optimization Threshold in Two-Layer {R}e{LU} Networks},
  author =       {Boursier, Etienne and Flammarion, Nicolas},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {5241--5275},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/boursier25a/boursier25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/boursier25a.html},
  abstract = 	 {Understanding generalization of overparametrized models remains a fundamental challenge in machine learning. The literature mostly studies generalization from an interpolation point of view, taking convergence towards a global minimum of the training loss for granted. This interpolation paradigm does not seem valid for complex tasks such as in-context learning or diffusion. It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks might converge towards simpler solutions rather than interpolating training data, which leads to a drastic improvement on the test loss. Our analysis relies on the so called early alignment phase, during which neurons align toward specific directions. This directional alignment leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.}
}

Endnote

%0 Conference Paper
%T Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks
%A Etienne Boursier
%A Nicolas Flammarion
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-boursier25a
%I PMLR
%P 5241--5275
%U https://proceedings.mlr.press/v267/boursier25a.html
%V 267
%X Understanding generalization of overparametrized models remains a fundamental challenge in machine learning. The literature mostly studies generalization from an interpolation point of view, taking convergence towards a global minimum of the training loss for granted. This interpolation paradigm does not seem valid for complex tasks such as in-context learning or diffusion. It has instead been empirically observed that the trained models go from global minima to spurious local minima of the training loss as the number of training samples becomes larger than some level we call optimization threshold. This paper explores theoretically this phenomenon in the context of two-layer ReLU networks. We demonstrate that, despite overparametrization, networks might converge towards simpler solutions rather than interpolating training data, which leads to a drastic improvement on the test loss. Our analysis relies on the so called early alignment phase, during which neurons align toward specific directions. This directional alignment leads to a simplicity bias, wherein the network approximates the ground truth model without converging to the global minimum of the training loss. Our results suggest this bias, resulting in an optimization threshold from which interpolation is not reached anymore, is beneficial and enhances the generalization of trained models.

APA

Boursier, E. & Flammarion, N.. (2025). Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:5241-5275 Available from https://proceedings.mlr.press/v267/boursier25a.html.

Simplicity Bias and Optimization Threshold in Two-Layer ReLU Networks

Abstract

Cite this Paper

Related Material