Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Yuanzhi Li; Tengyu Ma; Hongyang Zhang

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Yuanzhi Li, Tengyu Ma, Hongyang Zhang

Proceedings of the 31st Conference On Learning Theory, PMLR 75:2-47, 2018.

Abstract

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given

$\tilde{O}(dr^{2})$ random linear measurements of a rank

$r$ positive semidefinite matrix

$X^{\star}$ , we can recover

$X^{\star}$ by parameterizing it by

$UU^\top$ with

$U\in \mathbb R^{d\times d}$ and minimizing the squared loss, even if

$r \ll d$ . We prove that starting from a small initialization, gradient descent recovers

$X^{\star}$ in

$\tilde{O}(\sqrt{r})$ iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-li18a,
  title = 	 {Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations},
  author =       {Li, Yuanzhi and Ma, Tengyu and Zhang, Hongyang},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {2--47},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/li18a/li18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/li18a.html},
  abstract = 	 {We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linear measurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we can recover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in \mathbb R^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. We prove that starting from a small initialization, gradient descent recovers $X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property.  The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.}
}

Endnote

%0 Conference Paper
%T Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations
%A Yuanzhi Li
%A Tengyu Ma
%A Hongyang Zhang
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-li18a
%I PMLR
%P 2--47
%U https://proceedings.mlr.press/v75/li18a.html
%V 75
%X We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given $\tilde{O}(dr^{2})$ random linear measurements of a rank $r$ positive semidefinite matrix $X^{\star}$, we can recover $X^{\star}$ by parameterizing it by $UU^\top$ with $U\in \mathbb R^{d\times d}$ and minimizing the squared loss, even if $r \ll d$. We prove that starting from a small initialization, gradient descent recovers $X^{\star}$ in $\tilde{O}(\sqrt{r})$ iterations approximately. The results solve the conjecture of Gunasekar et al.’17 under the restricted isometry property.  The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

APA


Li, Y., Ma, T. & Zhang, H.. (2018). Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:2-47 Available from https://proceedings.mlr.press/v75/li18a.html.

Related Material

Download PDF