Implicit Regularization in ReLU Networks with the Square Loss

Gal Vardi; Ohad Shamir

Implicit Regularization in ReLU Networks with the Square Loss

Gal Vardi, Ohad Shamir

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:4224-4258, 2021.

Abstract

Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the “balancedness” property identified in Du et al. (2018). Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

Cite this Paper

BibTeX

@InProceedings{pmlr-v134-vardi21b,
  title = 	 {Implicit Regularization in ReLU Networks with the Square Loss},
  author =       {Vardi, Gal and Shamir, Ohad},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {4224--4258},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/vardi21b/vardi21b.pdf},
  url = 	 {https://proceedings.mlr.press/v134/vardi21b.html},
  abstract = 	 {Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the “balancedness” property identified in Du et al. (2018). Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.}
}

Endnote

%0 Conference Paper
%T Implicit Regularization in ReLU Networks with the Square Loss
%A Gal Vardi
%A Ohad Shamir
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-vardi21b
%I PMLR
%P 4224--4258
%U https://proceedings.mlr.press/v134/vardi21b.html
%V 134
%X Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the “balancedness” property identified in Du et al. (2018). Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

APA

Vardi, G. & Shamir, O.. (2021). Implicit Regularization in ReLU Networks with the Square Loss. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:4224-4258 Available from https://proceedings.mlr.press/v134/vardi21b.html.

Related Material

Download PDF