Scalable geometric density estimation

Ye Wang, Antonio Canale, David Dunson
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:857-865, 2016.

Abstract

It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-wang16e, title = {Scalable geometric density estimation}, author = {Wang, Ye and Canale, Antonio and Dunson, David}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {857--865}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/wang16e.pdf}, url = {https://proceedings.mlr.press/v51/wang16e.html}, abstract = {It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.} }
Endnote
%0 Conference Paper %T Scalable geometric density estimation %A Ye Wang %A Antonio Canale %A David Dunson %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-wang16e %I PMLR %P 857--865 %U https://proceedings.mlr.press/v51/wang16e.html %V 51 %X It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.
RIS
TY - CPAPER TI - Scalable geometric density estimation AU - Ye Wang AU - Antonio Canale AU - David Dunson BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-wang16e PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 857 EP - 865 L1 - http://proceedings.mlr.press/v51/wang16e.pdf UR - https://proceedings.mlr.press/v51/wang16e.html AB - It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency. ER -
APA
Wang, Y., Canale, A. & Dunson, D.. (2016). Scalable geometric density estimation. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:857-865 Available from https://proceedings.mlr.press/v51/wang16e.html.

Related Material