Implicit Regularization of Random Feature Models

Arthur Jacot; Berfin Simsek; Francesco Spadaro; Clement Hongler; Franck Gabriel

Implicit Regularization of Random Feature Models

Arthur Jacot, Berfin Simsek, Francesco Spadaro, Clement Hongler, Franck Gabriel

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4631-4640, 2020.

Abstract

Random Features (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with $P$ features, $N$ data points, and a ridge $\lambda$, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an \emph{effective ridge} $\tilde{\lambda}$. We show that $\tilde{\lambda} > \lambda$ and $\tilde{\lambda} \searrow \lambda$ monotonically as $P$ grows, thus revealing the \emph{implicit regularization effect} of finite RF sampling. We then compare the risk (i.e. test error) of the $\tilde{\lambda}$-KRR predictor with the average risk of the $\lambda$-RF predictor and obtain a precise and explicit bound on their difference. Finally, we empirically find an extremely good agreement between the test errors of the average $\lambda$-RF predictor and $\tilde{\lambda}$-KRR predictor.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-jacot20a,
  title = 	 {Implicit Regularization of Random Feature Models},
  author =       {Jacot, Arthur and Simsek, Berfin and Spadaro, Francesco and Hongler, Clement and Gabriel, Franck},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4631--4640},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/jacot20a/jacot20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/jacot20a.html},
  abstract = 	 {Random Features (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with $P$ features, $N$ data points, and a ridge $\lambda$, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an \emph{effective ridge} $\tilde{\lambda}$. We show that $\tilde{\lambda} > \lambda$ and $\tilde{\lambda} \searrow \lambda$ monotonically as $P$ grows, thus revealing the \emph{implicit regularization effect} of finite RF sampling. We then compare the risk (i.e. test error) of the $\tilde{\lambda}$-KRR predictor with the average risk of the $\lambda$-RF predictor and obtain a precise and explicit bound on their difference. Finally, we empirically find an extremely good agreement between the test errors of the average $\lambda$-RF predictor and $\tilde{\lambda}$-KRR predictor.}
}

Endnote

%0 Conference Paper
%T Implicit Regularization of Random Feature Models
%A Arthur Jacot
%A Berfin Simsek
%A Francesco Spadaro
%A Clement Hongler
%A Franck Gabriel
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-jacot20a
%I PMLR
%P 4631--4640
%U https://proceedings.mlr.press/v119/jacot20a.html
%V 119
%X Random Features (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with $P$ features, $N$ data points, and a ridge $\lambda$, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an \emph{effective ridge} $\tilde{\lambda}$. We show that $\tilde{\lambda} > \lambda$ and $\tilde{\lambda} \searrow \lambda$ monotonically as $P$ grows, thus revealing the \emph{implicit regularization effect} of finite RF sampling. We then compare the risk (i.e. test error) of the $\tilde{\lambda}$-KRR predictor with the average risk of the $\lambda$-RF predictor and obtain a precise and explicit bound on their difference. Finally, we empirically find an extremely good agreement between the test errors of the average $\lambda$-RF predictor and $\tilde{\lambda}$-KRR predictor.

APA


Jacot, A., Simsek, B., Spadaro, F., Hongler, C. & Gabriel, F.. (2020). Implicit Regularization of Random Feature Models. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4631-4640 Available from https://proceedings.mlr.press/v119/jacot20a.html.

Implicit Regularization of Random Feature Models

Abstract

Cite this Paper

Related Material