[edit]
Sparse PCA Beyond Covariance Thresholding
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4737-4776, 2023.
Abstract
In the Wishart model for sparse PCAwe are given $n$ samples $Y_1,\ldots, Y_n$ drawn independently from a $d$-dimensional Gaussian distribution $N({0, Id + \beta vv^\top})$, where $\beta > 0$ and $v\in \mathbb{R}^d$ is a $k$-sparse unit vector, and we wish to recover $v$ (up to sign).We show that if $n \ge \Omega(d)$, then for every $t \ll k$ there exists an algorithm running in time $n\cdot d^{O(t)}$ that solves this problem as long as\[\beta \gtrsim \frac{k}{\sqrt{nt}}\sqrt{\ln({2 + td/k^2})}\,.\]Prior to this work, the best polynomial time algorithm in the regime $k\approx \sqrt{d}$, called \emph{Covariance Thresholding} (proposed in Krauthgamer et al. (2015) and analyzed in Deshpande and Montanari (2014))), required $\beta \gtrsim \frac{k}{\sqrt{n}}\sqrt{\ln({2 + d/k^2})}$.For large enough constant $t$ our algorithm runs in polynomial time and has better guarantees than Covariance Thresholding.Previously known algorithms with such guarantees required quasi-polynomial time $d^{O(\log d)}$.Our idea is based on the idea of Alon et al. (1998) for reducing the clique size in the planted clique problem.Moreover, we show that it is possible to combine our techniques with recent results on sparse PCA with symmetric heavy-tailed noise d’Orsi et al. (2022).Their model generalizes both sparse PCA and the planted clique problem.In particular, in the regime $k \approx \sqrt{d}$ we get the first polynomial time algorithm that works with symmetric heavy-tailed noise, while the algorithm from d’Orsi et al. (2022) requires quasi-polynomial time in these settings. As a consequence, we get an algorithm that solves a problem that captures both sparse PCA and planted clique and achieves best known guarantees for both of them.In addition, we show that our techniques work with sparse PCA with adversarial perturbations studied in d’Orsi et al. (2020).This model generalizes not only sparse PCA, but also the sparse planted vector problem.As a consequence, we provide polynomial time algorithms for the sparse planted vector problem that have better guarantees thanthe state of the art in some regimes.