[edit]
Support Recovery in Sparse PCA with General Missing Data
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:2163-2187, 2024.
Abstract
We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.