Randomized partition trees for exact nearest neighbor search

Sanjoy Dasgupta; Kaushik Sinha

Randomized partition trees for exact nearest neighbor search

Sanjoy Dasgupta, Kaushik Sinha

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:317-337, 2013.

Abstract

The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Dasgupta13,
  title = 	 {Randomized partition trees for exact nearest neighbor search},
  author = 	 {Dasgupta, Sanjoy and Sinha, Kaushik},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {317--337},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Dasgupta13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Dasgupta13.html},
  abstract = 	 {The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.}
}

Endnote

%0 Conference Paper
%T Randomized partition trees for exact nearest neighbor search
%A Sanjoy Dasgupta
%A Kaushik Sinha
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Dasgupta13
%I PMLR
%P 317--337
%U https://proceedings.mlr.press/v30/Dasgupta13.html
%V 30
%X The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.

RIS


TY  - CPAPER
TI  - Randomized partition trees for exact nearest neighbor search
AU  - Sanjoy Dasgupta
AU  - Kaushik Sinha
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Dasgupta13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 317
EP  - 337
L1  - http://proceedings.mlr.press/v30/Dasgupta13.pdf
UR  - https://proceedings.mlr.press/v30/Dasgupta13.html
AB  - The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.
ER  -

APA


Dasgupta, S. & Sinha, K.. (2013). Randomized partition trees for exact nearest neighbor search. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:317-337 Available from https://proceedings.mlr.press/v30/Dasgupta13.html.

Related Material

Download PDF