Approximate Nearest Neighbors in Limited  Space

Piotr Indyk; Tal Wagner

Approximate Nearest Neighbors in Limited Space

Piotr Indyk, Tal Wagner

Proceedings of the 31st Conference On Learning Theory, PMLR 75:2012-2036, 2018.

Abstract

We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point $y$, finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \|x-y\|$ for an accuracy parameter $\epsilon \in (0,1)$. Our main result is a data structure that occupies only $O(\epsilon^{-2} n \log(n) \log(1/\epsilon))$ bits of space, assuming all point coordinates are integers in the range $\{-n^{O(1)} \ldots n^{O(1)}\}$, i.e., the coordinates have $O(\log n)$ bits of precision. This improves over the best previously known space bound of $O(\epsilon^{-2} n \log(n)^2)$, obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984). We also consider the more general problem of estimating all distances from a collection of query points to all data points $X$, and provide almost tight upper and lower bounds for the space complexity of this problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-indyk18a,
  title = 	 {Approximate Nearest Neighbors in Limited  Space},
  author =       {Indyk, Piotr and Wagner, Tal},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {2012--2036},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/indyk18a/indyk18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/indyk18a.html},
  abstract = 	 {We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point  $y$,  finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \|x-y\|$ for an accuracy parameter $\epsilon \in (0,1)$.  Our main result is a data structure that occupies only $O(\epsilon^{-2} n \log(n) \log(1/\epsilon))$ bits of space, assuming all point coordinates are integers in the range  $\{-n^{O(1)} \ldots n^{O(1)}\}$, i.e., the coordinates have $O(\log n)$ bits of precision. This improves over the best previously known space bound of         $O(\epsilon^{-2} n \log(n)^2)$, obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984).  We also consider the more general problem of estimating all distances from a collection of query points to all data points $X$, and provide almost tight upper and lower bounds for the space complexity of this problem. }
}

Endnote

%0 Conference Paper
%T Approximate Nearest Neighbors in Limited  Space
%A Piotr Indyk
%A Tal Wagner
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-indyk18a
%I PMLR
%P 2012--2036
%U https://proceedings.mlr.press/v75/indyk18a.html
%V 75
%X We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point  $y$,  finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \|x-y\|$ for an accuracy parameter $\epsilon \in (0,1)$.  Our main result is a data structure that occupies only $O(\epsilon^{-2} n \log(n) \log(1/\epsilon))$ bits of space, assuming all point coordinates are integers in the range  $\{-n^{O(1)} \ldots n^{O(1)}\}$, i.e., the coordinates have $O(\log n)$ bits of precision. This improves over the best previously known space bound of         $O(\epsilon^{-2} n \log(n)^2)$, obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984).  We also consider the more general problem of estimating all distances from a collection of query points to all data points $X$, and provide almost tight upper and lower bounds for the space complexity of this problem.

APA


Indyk, P. & Wagner, T.. (2018). Approximate Nearest Neighbors in Limited  Space. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:2012-2036 Available from https://proceedings.mlr.press/v75/indyk18a.html.

Related Material

Download PDF