Principal Component Analysis on non-Gaussian Dependent Data

Fang Han; Han Liu

Principal Component Analysis on non-Gaussian Dependent Data

Fang Han, Han Liu

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(1):240-248, 2013.

Abstract

In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ($m$-dependency or a more general $\phi$-mixing case). We show that COCA can allow weak dependence. In particular, we provide the generalization bounds of convergence for both support recovery and parameter estimation of COCA for the dependent data. We provide explicit sufficient conditions on the degree of dependence, under which the parametric rate can be maintained. To our knowledge, this is the first work analyzing the theoretical performance of PCA for the dependent data in high dimensional settings. Our results strictly generalize the analysis in Han & Liu (2012) and the techniques we used have the separate interest for analyzing a variety of other multivariate statistical methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-han13,
  title = 	 {Principal Component Analysis on non-{G}aussian Dependent Data},
  author = 	 {Han, Fang and Liu, Han},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {240--248},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/han13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/han13.html},
  abstract = 	 {In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ($m$-dependency or a more general $\phi$-mixing case). We show that COCA can allow weak dependence. In particular, we provide the generalization bounds of convergence for both support recovery and parameter estimation of COCA for the dependent data. We provide explicit sufficient conditions on the degree of dependence, under which the parametric rate can be maintained. To our knowledge, this is the first work analyzing the theoretical performance of PCA for the dependent data in high dimensional settings. Our results strictly generalize the analysis in Han & Liu (2012) and the techniques we used have the separate interest for analyzing a variety of other multivariate statistical methods.}
}

Endnote

%0 Conference Paper
%T Principal Component Analysis on non-Gaussian Dependent Data
%A Fang Han
%A Han Liu
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-han13
%I PMLR
%P 240--248
%U https://proceedings.mlr.press/v28/han13.html
%V 28
%N 1
%X In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ($m$-dependency or a more general $\phi$-mixing case). We show that COCA can allow weak dependence. In particular, we provide the generalization bounds of convergence for both support recovery and parameter estimation of COCA for the dependent data. We provide explicit sufficient conditions on the degree of dependence, under which the parametric rate can be maintained. To our knowledge, this is the first work analyzing the theoretical performance of PCA for the dependent data in high dimensional settings. Our results strictly generalize the analysis in Han & Liu (2012) and the techniques we used have the separate interest for analyzing a variety of other multivariate statistical methods.

RIS


TY  - CPAPER
TI  - Principal Component Analysis on non-Gaussian Dependent Data
AU  - Fang Han
AU  - Han Liu
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/02/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-han13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 1
SP  - 240
EP  - 248
L1  - http://proceedings.mlr.press/v28/han13.pdf
UR  - https://proceedings.mlr.press/v28/han13.html
AB  - In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ($m$-dependency or a more general $\phi$-mixing case). We show that COCA can allow weak dependence. In particular, we provide the generalization bounds of convergence for both support recovery and parameter estimation of COCA for the dependent data. We provide explicit sufficient conditions on the degree of dependence, under which the parametric rate can be maintained. To our knowledge, this is the first work analyzing the theoretical performance of PCA for the dependent data in high dimensional settings. Our results strictly generalize the analysis in Han & Liu (2012) and the techniques we used have the separate interest for analyzing a variety of other multivariate statistical methods.
ER  -

APA


Han, F. & Liu, H.. (2013). Principal Component Analysis on non-Gaussian Dependent Data. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(1):240-248 Available from https://proceedings.mlr.press/v28/han13.html.

Related Material

Download PDF