Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Tianyang Hu; Wenjia Wang; Cong Lin; Guang Cheng

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:829-837, 2021.

Abstract

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v130-hu21a,
  title = 	 { Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network },
  author =       {Hu, Tianyang and Wang, Wenjia and Lin, Cong and Cheng, Guang},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {829--837},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/hu21a/hu21a.pdf},
  url = 	 {https://proceedings.mlr.press/v130/hu21a.html},
  abstract = 	 { Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks. }
}

Endnote

%0 Conference Paper
%T  Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network 
%A Tianyang Hu
%A Wenjia Wang
%A Cong Lin
%A Guang Cheng
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-hu21a
%I PMLR
%P 829--837
%U https://proceedings.mlr.press/v130/hu21a.html
%V 130
%X  Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks.

APA

Hu, T., Wang, W., Lin, C. & Cheng, G.. (2021).  Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:829-837 Available from https://proceedings.mlr.press/v130/hu21a.html.

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Abstract

Cite this Paper

Related Material