[edit]
Open Problem: How much overparametrization is needed for ALS in tensor decomposition?
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:7105-7110, 2026.
Abstract
We ask how much overparameterization is needed for simple iterative methods such as alternating least squares (ALS) and gradient descent to decompose a third-order tensor. This question can be viewed as a basic setting to study feature learning: when a rank-$r$ tensor in ambient dimension $n$ has $r\ll n$, the latent rank-one components are the features, and $k$ is the amount of overparameterization used by the algorithm. For rank $r$ tensors, recent work shows that overparametrized rank $k=O(r^2)$ suffices for the popular ALS heuristic (with random initialization) to converge to a global optima. Is the quadratic dependence on $r$ an inherent barrier for ALS-like methods? We pose the open problem of proving convergence to the global optimum for $k=o(r^2)$, or proving that a lower bound on the overparametrized rank of $k=\Omega(r^{1+c})$ for some absolute constant $c>0$ is necessary.