Average-Case Integrality Gap for Non-Negative Principal Component Analysis

Afonso Bandeira; Dmitriy Kunisky; Alexander Wein

Average-Case Integrality Gap for Non-Negative Principal Component Analysis

Afonso Bandeira, Dmitriy Kunisky, Alexander Wein

Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:153-171, 2022.

Abstract

Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\bx^{\top}\bW \bx$ over $\|\bx\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\bW \in \RR^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP appears to be \emph{tight} for the GOE, producing a rank-one optimal matrix solution aligned with the optimal vector $\bx$. We prove, however, that as $n \to \infty$ the SDP is not tight, and certifies an upper bound asymptotically no better than the simple spectral bound $\lambda_{\max}(\bW)$ on this objective function. We also provide evidence, using tools from recent literature on hypothesis testing with low-degree polynomials, that no subexponential-time certification algorithm can improve on this behavior. Finally, we present further numerical experiments estimating how large $n$ would need to be before this limiting behavior becomes evident, providing a cautionary example against extrapolating asymptotics of SDPs in high dimension from their efficacy in small “laptop scale” computations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v145-bandeira22a,
  title = 	 {Average-Case Integrality Gap for Non-Negative Principal Component Analysis},
  author =       {Bandeira, Afonso and Kunisky, Dmitriy and Wein, Alexander},
  booktitle = 	 {Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference},
  pages = 	 {153--171},
  year = 	 {2022},
  editor = 	 {Bruna, Joan and Hesthaven, Jan and Zdeborova, Lenka},
  volume = 	 {145},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v145/bandeira22a/bandeira22a.pdf},
  url = 	 {https://proceedings.mlr.press/v145/bandeira22a.html},
  abstract = 	 { Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\bx^{\top}\bW \bx$ over $\|\bx\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\bW \in \RR^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP appears to be \emph{tight} for the GOE, producing a rank-one optimal matrix solution aligned with the optimal vector $\bx$. We prove, however, that as $n \to \infty$ the SDP is not tight, and certifies an upper bound asymptotically no better than the simple spectral bound $\lambda_{\max}(\bW)$ on this objective function. We also provide evidence, using tools from recent literature on hypothesis testing with low-degree polynomials, that no subexponential-time certification algorithm can improve on this behavior. Finally, we present further numerical experiments estimating how large $n$ would need to be before this limiting behavior becomes evident, providing a cautionary example against extrapolating asymptotics of SDPs in high dimension from their efficacy in small “laptop scale” computations.}
}

Endnote

%0 Conference Paper
%T Average-Case Integrality Gap for Non-Negative Principal Component Analysis
%A Afonso Bandeira
%A Dmitriy Kunisky
%A Alexander Wein
%B Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference
%C Proceedings of Machine Learning Research
%D 2022
%E Joan Bruna
%E Jan Hesthaven
%E Lenka Zdeborova	
%F pmlr-v145-bandeira22a
%I PMLR
%P 153--171
%U https://proceedings.mlr.press/v145/bandeira22a.html
%V 145
%X  Montanari and Richard (2015) asked whether a natural semidefinite programming (SDP) relaxation can effectively optimize $\bx^{\top}\bW \bx$ over $\|\bx\| = 1$ with $x_i \geq 0$ for all coordinates $i$, where $\bW \in \RR^{n \times n}$ is drawn from the Gaussian orthogonal ensemble (GOE) or a spiked matrix model. In small numerical experiments, this SDP appears to be \emph{tight} for the GOE, producing a rank-one optimal matrix solution aligned with the optimal vector $\bx$. We prove, however, that as $n \to \infty$ the SDP is not tight, and certifies an upper bound asymptotically no better than the simple spectral bound $\lambda_{\max}(\bW)$ on this objective function. We also provide evidence, using tools from recent literature on hypothesis testing with low-degree polynomials, that no subexponential-time certification algorithm can improve on this behavior. Finally, we present further numerical experiments estimating how large $n$ would need to be before this limiting behavior becomes evident, providing a cautionary example against extrapolating asymptotics of SDPs in high dimension from their efficacy in small “laptop scale” computations.

APA


Bandeira, A., Kunisky, D. & Wein, A.. (2022). Average-Case Integrality Gap for Non-Negative Principal Component Analysis. Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, in Proceedings of Machine Learning Research 145:153-171 Available from https://proceedings.mlr.press/v145/bandeira22a.html.

Related Material

Download PDF