Support Recovery in Sparse PCA with General Missing Data

Hanbyul Lee, Qifan Song, Jean Honorio
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:2163-2187, 2024.

Abstract

We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-lee24a, title = {Support Recovery in Sparse PCA with General Missing Data}, author = {Lee, Hanbyul and Song, Qifan and Honorio, Jean}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {2163--2187}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/lee24a/lee24a.pdf}, url = {https://proceedings.mlr.press/v244/lee24a.html}, abstract = {We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.} }
Endnote
%0 Conference Paper %T Support Recovery in Sparse PCA with General Missing Data %A Hanbyul Lee %A Qifan Song %A Jean Honorio %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-lee24a %I PMLR %P 2163--2187 %U https://proceedings.mlr.press/v244/lee24a.html %V 244 %X We analyze a sparse PCA algorithm for incomplete and noisy data without any specific model assumption on the data missing scheme. We utilize a graphical approach to characterize general missing patterns, which enables us to analyze the effect of structural properties of missing patterns on the solvability of sparse PCA problem. The sparse PCA method we focus on is a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that the support of the sparse leading eigenvector can be recovered with high probability using the algorithm, under certain conditions. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the missing pattern. The concepts of algebraic connectivity and irregularity are used to describe the properties in a graphical way. We empirically justify our theorem with synthetic data analysis. We show that the SDP algorithm outperforms other sparse PCA approaches especially when the observation pattern has good structural properties. As a by-product of our analysis, we provide two theorems to handle general missing schemes, which can be applied to other problems related to incomplete data matrices.
APA
Lee, H., Song, Q. & Honorio, J.. (2024). Support Recovery in Sparse PCA with General Missing Data. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:2163-2187 Available from https://proceedings.mlr.press/v244/lee24a.html.

Related Material