Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods

Yan Shuo Tan, Roman Vershynin
Proceedings of the 31st Conference On Learning Theory, PMLR 75:498-534, 2018.

Abstract

The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-Gaussian distribution. Vempala and Xiao (2011) proposed a local search algorithm, and showed that it was able to estimate $E$ accurately with polynomial time and sample complexity, if the dimension of $E$ is treated as a constant and with the assumption that all one-dimensional marginals of the non-Gaussian distribution over $E$ have non-Gaussian moments. In this paper, we propose a simple spectral algorithm called \textsc{Reweighted PCA}, and prove that it possesses the same guarantee. The principle that underlies this approach is a new characterization of multivariate Gaussian distributions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-tan18a, title = {Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods}, author = {Tan, Yan Shuo and Vershynin, Roman}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {498--534}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/tan18a/tan18a.pdf}, url = {https://proceedings.mlr.press/v75/tan18a.html}, abstract = {The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-Gaussian distribution. Vempala and Xiao (2011) proposed a local search algorithm, and showed that it was able to estimate $E$ accurately with polynomial time and sample complexity, if the dimension of $E$ is treated as a constant and with the assumption that all one-dimensional marginals of the non-Gaussian distribution over $E$ have non-Gaussian moments. In this paper, we propose a simple spectral algorithm called \textsc{Reweighted PCA}, and prove that it possesses the same guarantee. The principle that underlies this approach is a new characterization of multivariate Gaussian distributions.} }
Endnote
%0 Conference Paper %T Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods %A Yan Shuo Tan %A Roman Vershynin %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-tan18a %I PMLR %P 498--534 %U https://proceedings.mlr.press/v75/tan18a.html %V 75 %X The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace $E$ in $\mathbb{R}^n$ so that data points projected onto $E$ follow a non-Gaussian distribution. Vempala and Xiao (2011) proposed a local search algorithm, and showed that it was able to estimate $E$ accurately with polynomial time and sample complexity, if the dimension of $E$ is treated as a constant and with the assumption that all one-dimensional marginals of the non-Gaussian distribution over $E$ have non-Gaussian moments. In this paper, we propose a simple spectral algorithm called \textsc{Reweighted PCA}, and prove that it possesses the same guarantee. The principle that underlies this approach is a new characterization of multivariate Gaussian distributions.
APA
Tan, Y.S. & Vershynin, R.. (2018). Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:498-534 Available from https://proceedings.mlr.press/v75/tan18a.html.

Related Material