Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training

Max Milkert, David Hyde, Forrest John Laine
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:44175-44198, 2025.

Abstract

In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth. However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large. To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization. This approach allows us to learn approximations of convex, one-dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts. We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-milkert25a, title = {Compelling {R}e{LU} Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training}, author = {Milkert, Max and Hyde, David and Laine, Forrest John}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {44175--44198}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/milkert25a/milkert25a.pdf}, url = {https://proceedings.mlr.press/v267/milkert25a.html}, abstract = {In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth. However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large. To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization. This approach allows us to learn approximations of convex, one-dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts. We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.} }
Endnote
%0 Conference Paper %T Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training %A Max Milkert %A David Hyde %A Forrest John Laine %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-milkert25a %I PMLR %P 44175--44198 %U https://proceedings.mlr.press/v267/milkert25a.html %V 267 %X In a neural network with ReLU activations, the number of piecewise linear regions in the output can grow exponentially with depth. However, this is highly unlikely to happen when the initial parameters are sampled randomly, which therefore often leads to the use of networks that are unnecessarily large. To address this problem, we introduce a novel parameterization of the network that restricts its weights so that a depth $d$ network produces exactly $2^d$ linear regions at initialization and maintains those regions throughout training under the parameterization. This approach allows us to learn approximations of convex, one-dimensional functions that are several orders of magnitude more accurate than their randomly initialized counterparts. We further demonstrate a preliminary extension of our construction to multidimensional and non-convex functions, allowing the technique to replace traditional dense layers in various architectures.
APA
Milkert, M., Hyde, D. & Laine, F.J.. (2025). Compelling ReLU Networks to Exhibit Exponentially Many Linear Regions at Initialization and During Training. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:44175-44198 Available from https://proceedings.mlr.press/v267/milkert25a.html.

Related Material