Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Ke Li; Jitendra Malik

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Ke Li, Jitendra Malik

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:671-679, 2016.

Abstract

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.

Cite this Paper

BibTeX

@InProceedings{pmlr-v48-lic16,
  title = 	 {Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing},
  author = 	 {Li, Ke and Malik, Jitendra},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {671--679},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/lic16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/lic16.html},
  abstract = 	 {Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.}
}

Endnote

%0 Conference Paper
%T Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing
%A Ke Li
%A Jitendra Malik
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-lic16
%I PMLR
%P 671--679
%U https://proceedings.mlr.press/v48/lic16.html
%V 48
%X Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.

RIS

TY  - CPAPER
TI  - Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing
AU  - Ke Li
AU  - Jitendra Malik
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-lic16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 671
EP  - 679
L1  - http://proceedings.mlr.press/v48/lic16.pdf
UR  - https://proceedings.mlr.press/v48/lic16.html
AB  - Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.
ER  -

APA

Li, K. & Malik, J.. (2016). Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:671-679 Available from https://proceedings.mlr.press/v48/lic16.html.

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Abstract

Cite this Paper

Related Material