Algorithmic Regularization in Overparameterized Matrix Sensing and Neural Networks with Quadratic Activations
[edit]
Proceedings of the 31st Conference On Learning Theory, PMLR 75:247, 2018.
Abstract
We show that the gradient descent algorithm provides an implicit regularization effect in the learning of overparameterized matrix factorization models and onehiddenlayer neural networks with quadratic activations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linear measurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we can recover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in \mathbb R^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. We prove that starting from a small initialization, gradient descent recovers $X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property. The technique can be applied to analyzing neural networks with onehiddenlayer quadratic activations with some technical modifications.
Related Material


