Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces

Amirali Abdullah; Ravi Kumar; Andrew McGregor; Sergei Vassilvitskii; Suresh Venkatasubramanian

Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces

Amirali Abdullah, Ravi Kumar, Andrew McGregor, Sergei Vassilvitskii, Suresh Venkatasubramanian

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:948-956, 2016.

Abstract

In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-abdullah16,
  title = 	 {Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces},
  author = 	 {Abdullah, Amirali and Kumar, Ravi and McGregor, Andrew and Vassilvitskii, Sergei and Venkatasubramanian, Suresh},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {948--956},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/abdullah16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/abdullah16.html},
  abstract = 	 {In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all  pairwise distances.  We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space.  We also show that these divergences  can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.}
}

Endnote

%0 Conference Paper
%T Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces
%A Amirali Abdullah
%A Ravi Kumar
%A Andrew McGregor
%A Sergei Vassilvitskii
%A Suresh Venkatasubramanian
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-abdullah16
%I PMLR
%P 948--956
%U https://proceedings.mlr.press/v51/abdullah16.html
%V 51
%X In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all  pairwise distances.  We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space.  We also show that these divergences  can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.

RIS


TY  - CPAPER
TI  - Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces
AU  - Amirali Abdullah
AU  - Ravi Kumar
AU  - Andrew McGregor
AU  - Sergei Vassilvitskii
AU  - Suresh Venkatasubramanian
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-abdullah16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 948
EP  - 956
L1  - http://proceedings.mlr.press/v51/abdullah16.pdf
UR  - https://proceedings.mlr.press/v51/abdullah16.html
AB  - In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all  pairwise distances.  We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space.  We also show that these divergences  can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.
ER  -

APA


Abdullah, A., Kumar, R., McGregor, A., Vassilvitskii, S. & Venkatasubramanian, S.. (2016). Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:948-956 Available from https://proceedings.mlr.press/v51/abdullah16.html.

Related Material

Download PDF