[edit]
Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations
Proceedings of the 31st Conference On Learning Theory, PMLR 75:2-47, 2018.
Abstract
We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given ˜O(dr2) random linear measurements of a rank r positive semidefinite matrix X⋆, we can recover X⋆ by parameterizing it by UU⊤ with U∈Rd×d and minimizing the squared loss, even if r≪d. We prove that starting from a small initialization, gradient descent recovers X⋆ in ˜O(√r) iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.