A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin; Vitaliy Kurlin

A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin, Vitaliy Kurlin

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9267-9311, 2023.

Abstract

Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-elkin23a,
  title = 	 {A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree},
  author =       {Elkin, Yury and Kurlin, Vitaliy},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {9267--9311},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/elkin23a/elkin23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/elkin23a.html},
  abstract = 	 {Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.}
}

Endnote

%0 Conference Paper
%T A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree
%A Yury Elkin
%A Vitaliy Kurlin
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-elkin23a
%I PMLR
%P 9267--9311
%U https://proceedings.mlr.press/v202/elkin23a.html
%V 202
%X Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.

APA

Elkin, Y. & Kurlin, V.. (2023). A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9267-9311 Available from https://proceedings.mlr.press/v202/elkin23a.html.

A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Abstract

Cite this Paper

Related Material