Scalable geometric density estimation

Ye Wang; Antonio Canale; David Dunson

Scalable geometric density estimation

Ye Wang, Antonio Canale, David Dunson

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:857-865, 2016.

Abstract

It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-wang16e,
  title = 	 {Scalable geometric density estimation},
  author = 	 {Wang, Ye and Canale, Antonio and Dunson, David},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {857--865},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/wang16e.pdf},
  url = 	 {https://proceedings.mlr.press/v51/wang16e.html},
  abstract = 	 {It is standard to assume a low-dimensional structure in estimating a high-dimensional density.  However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.}
}

Endnote

%0 Conference Paper
%T Scalable geometric density estimation
%A Ye Wang
%A Antonio Canale
%A David Dunson
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-wang16e
%I PMLR
%P 857--865
%U https://proceedings.mlr.press/v51/wang16e.html
%V 51
%X It is standard to assume a low-dimensional structure in estimating a high-dimensional density.  However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.

RIS


TY  - CPAPER
TI  - Scalable geometric density estimation
AU  - Ye Wang
AU  - Antonio Canale
AU  - David Dunson
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-wang16e
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 857
EP  - 865
L1  - http://proceedings.mlr.press/v51/wang16e.pdf
UR  - https://proceedings.mlr.press/v51/wang16e.html
AB  - It is standard to assume a low-dimensional structure in estimating a high-dimensional density.  However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.
ER  -

APA


Wang, Y., Canale, A. & Dunson, D.. (2016). Scalable geometric density estimation. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:857-865 Available from https://proceedings.mlr.press/v51/wang16e.html.

Scalable geometric density estimation

Abstract

Cite this Paper

Related Material