[edit]
Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2784-2802, 2022.
Abstract
We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix Y∗∈Rd×d from a noisy observation Y using an over-parameterization model. Specifically, we parameterize the rank one matrix Y∗ by XX⊤, where X∈Rd×d. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of O(σ2/d), where σ2 is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of O(σ2). Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.