Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces

Amirali Abdullah, Ravi Kumar, Andrew McGregor, Sergei Vassilvitskii, Suresh Venkatasubramanian
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:948-956, 2016.

Abstract

In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-abdullah16, title = {Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces}, author = {Abdullah, Amirali and Kumar, Ravi and McGregor, Andrew and Vassilvitskii, Sergei and Venkatasubramanian, Suresh}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {948--956}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/abdullah16.pdf}, url = {https://proceedings.mlr.press/v51/abdullah16.html}, abstract = {In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.} }
Endnote
%0 Conference Paper %T Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces %A Amirali Abdullah %A Ravi Kumar %A Andrew McGregor %A Sergei Vassilvitskii %A Suresh Venkatasubramanian %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-abdullah16 %I PMLR %P 948--956 %U https://proceedings.mlr.press/v51/abdullah16.html %V 51 %X In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.
RIS
TY - CPAPER TI - Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces AU - Amirali Abdullah AU - Ravi Kumar AU - Andrew McGregor AU - Sergei Vassilvitskii AU - Suresh Venkatasubramanian BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-abdullah16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 948 EP - 956 L1 - http://proceedings.mlr.press/v51/abdullah16.pdf UR - https://proceedings.mlr.press/v51/abdullah16.html AB - In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model. ER -
APA
Abdullah, A., Kumar, R., McGregor, A., Vassilvitskii, S. & Venkatasubramanian, S.. (2016). Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:948-956 Available from https://proceedings.mlr.press/v51/abdullah16.html.

Related Material