Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Tianyi Liu; Yan Li; Enlu Zhou; Tuo Zhao

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2784-2802, 2022.

Abstract

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix

$Y^*\in R^{d\times d}$ from a noisy observation

$Y$ using an over-parameterization model. Specifically, we parameterize the rank one matrix

$Y^*$ by

$XX^\top$ , where

$X\in R^{d\times d}$ . We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of

$O(\sigma^2/d)$ , where

$\sigma^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of

$O(\sigma^2)$ . Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-liu22c,
  title = 	 { Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably },
  author =       {Liu, Tianyi and Li, Yan and Zhou, Enlu and Zhao, Tuo},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2784--2802},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/liu22c/liu22c.pdf},
  url = 	 {https://proceedings.mlr.press/v151/liu22c.html},
  abstract = 	 { We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. Specifically, we parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(\sigma^2/d)$, where $\sigma^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(\sigma^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks. }
}

Endnote

%0 Conference Paper
%T  Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably 
%A Tianyi Liu
%A Yan Li
%A Enlu Zhou
%A Tuo Zhao
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-liu22c
%I PMLR
%P 2784--2802
%U https://proceedings.mlr.press/v151/liu22c.html
%V 151
%X  We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. Specifically, we parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(\sigma^2/d)$, where $\sigma^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(\sigma^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

APA


Liu, T., Li, Y., Zhou, E. & Zhao, T.. (2022).  Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:2784-2802 Available from https://proceedings.mlr.press/v151/liu22c.html.

Related Material

Download PDF