Binary Embedding: Fundamental Limits and Fast Algorithm

Xinyang Yi; Constantine Caramanis; Eric Price

Binary Embedding: Fundamental Limits and Fast Algorithm

Xinyang Yi, Constantine Caramanis, Eric Price

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2162-2170, 2015.

Abstract

Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-yi15,
  title = 	 {Binary Embedding: Fundamental Limits and Fast Algorithm},
  author = 	 {Yi, Xinyang and Caramanis, Constantine and Price, Eric},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {2162--2170},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/yi15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/yi15.html},
  abstract = 	 {Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.}
}

Endnote

%0 Conference Paper
%T Binary Embedding: Fundamental Limits and Fast Algorithm
%A Xinyang Yi
%A Constantine Caramanis
%A Eric Price
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-yi15
%I PMLR
%P 2162--2170
%U https://proceedings.mlr.press/v37/yi15.html
%V 37
%X Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

RIS


TY  - CPAPER
TI  - Binary Embedding: Fundamental Limits and Fast Algorithm
AU  - Xinyang Yi
AU  - Constantine Caramanis
AU  - Eric Price
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-yi15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 2162
EP  - 2170
L1  - http://proceedings.mlr.press/v37/yi15.pdf
UR  - https://proceedings.mlr.press/v37/yi15.html
AB  - Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.
ER  -

APA


Yi, X., Caramanis, C. & Price, E.. (2015). Binary Embedding: Fundamental Limits and Fast Algorithm. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:2162-2170 Available from https://proceedings.mlr.press/v37/yi15.html.

Binary Embedding: Fundamental Limits and Fast Algorithm

Abstract

Cite this Paper

Related Material