Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

Jianhao Ma; Salar Fattahi

Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion

Jianhao Ma, Salar Fattahi

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:3683-3742, 2024.

Abstract

We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $X^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $UU^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $X^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r’\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r’\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|U_0\| \leq O(\epsilon)$ converges near-linearly to an $\epsilon$-neighborhood of $X^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r’$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r’=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|U_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v247-ma24a,
  title = 	 {Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion},
  author =       {Ma, Jianhao and Fattahi, Salar},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {3683--3742},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/ma24a/ma24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/ma24a.html},
  abstract = 	 {We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $X^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $UU^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $X^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r’\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$.  In the over-parameterized regime where $r’\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|U_0\| \leq O(\epsilon)$ converges near-linearly to an $\epsilon$-neighborhood of $X^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r’$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r’=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|U_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.}
}

Endnote

%0 Conference Paper
%T Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion
%A Jianhao Ma
%A Salar Fattahi
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-ma24a
%I PMLR
%P 3683--3742
%U https://proceedings.mlr.press/v247/ma24a.html
%V 247
%X We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $X^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $UU^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $X^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r’\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$.  In the over-parameterized regime where $r’\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $\|U_0\| \leq O(\epsilon)$ converges near-linearly to an $\epsilon$-neighborhood of $X^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r’$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r’=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $\|U_0\| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.

APA

Ma, J. & Fattahi, S.. (2024). Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:3683-3742 Available from https://proceedings.mlr.press/v247/ma24a.html.

Related Material

Download PDF