A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin, Vitaliy Kurlin
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:9267-9311, 2023.

Abstract

Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-elkin23a, title = {A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree}, author = {Elkin, Yury and Kurlin, Vitaliy}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {9267--9311}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/elkin23a/elkin23a.pdf}, url = {https://proceedings.mlr.press/v202/elkin23a.html}, abstract = {Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.} }
Endnote
%0 Conference Paper %T A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree %A Yury Elkin %A Vitaliy Kurlin %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-elkin23a %I PMLR %P 9267--9311 %U https://proceedings.mlr.press/v202/elkin23a.html %V 202 %X Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.
APA
Elkin, Y. & Kurlin, V.. (2023). A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:9267-9311 Available from https://proceedings.mlr.press/v202/elkin23a.html.

Related Material