Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

Anshumali Shrivastava; Ping Li

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

Anshumali Shrivastava, Ping Li

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):557-565, 2014.

Abstract

The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), \em minwise hashing is widely adopted, which requires applying a large number of permutations on the data. This is costly in computation and energy-consumption. In this paper, we propose a hashing technique which generates all the necessary hash evaluations needed for similarity search, using one single permutation. The heart of the proposed hash function is a “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent interest for densifying other types of sparse sketches. Using our proposed hashing method, the query time of a (K,L)-parameterized LSH is reduced from the typical O(dKL) complexity to merely O(KL+dL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-shrivastava14,
  title = 	 {Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search},
  author = 	 {Shrivastava, Anshumali and Li, Ping},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {557--565},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/shrivastava14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/shrivastava14.html},
  abstract = 	 {The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams),  \em minwise hashing is widely adopted, which requires applying a large number  of permutations on the data. This is  costly in computation and energy-consumption.    In this paper, we propose a  hashing technique which generates all the necessary hash evaluations needed for similarity search, using  one single permutation.  The heart of the proposed hash function is a  “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent  interest  for densifying  other types of sparse sketches.     Using our proposed hashing method, the  query time of a (K,L)-parameterized LSH is reduced from  the typical O(dKL) complexity to merely O(KL+dL), where d is the  number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables.  Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.}
}

Endnote

%0 Conference Paper
%T Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search
%A Anshumali Shrivastava
%A Ping Li
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-shrivastava14
%I PMLR
%P 557--565
%U https://proceedings.mlr.press/v32/shrivastava14.html
%V 32
%N 1
%X The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams),  \em minwise hashing is widely adopted, which requires applying a large number  of permutations on the data. This is  costly in computation and energy-consumption.    In this paper, we propose a  hashing technique which generates all the necessary hash evaluations needed for similarity search, using  one single permutation.  The heart of the proposed hash function is a  “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent  interest  for densifying  other types of sparse sketches.     Using our proposed hashing method, the  query time of a (K,L)-parameterized LSH is reduced from  the typical O(dKL) complexity to merely O(KL+dL), where d is the  number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables.  Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.

RIS


TY  - CPAPER
TI  - Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search
AU  - Anshumali Shrivastava
AU  - Ping Li
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-shrivastava14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 557
EP  - 565
L1  - http://proceedings.mlr.press/v32/shrivastava14.pdf
UR  - https://proceedings.mlr.press/v32/shrivastava14.html
AB  - The query complexity of \em locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size \citeProc:Indyk_STOC98. In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams),  \em minwise hashing is widely adopted, which requires applying a large number  of permutations on the data. This is  costly in computation and energy-consumption.    In this paper, we propose a  hashing technique which generates all the necessary hash evaluations needed for similarity search, using  one single permutation.  The heart of the proposed hash function is a  “rotation” scheme which densifies the sparse sketches of \em one permutation hashing \citeProc:Li_Owen_Zhang_NIPS12 in an unbiased fashion thereby maintaining the LSH property. This makes the obtained sketches suitable for hash table construction. This idea of rotation presented in this paper could be of independent  interest  for densifying  other types of sparse sketches.     Using our proposed hashing method, the  query time of a (K,L)-parameterized LSH is reduced from  the typical O(dKL) complexity to merely O(KL+dL), where d is the  number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables.  Our experimental evaluation on real data confirms that the proposed scheme significantly reduces the query processing time over minwise hashing without loss in retrieval accuracies.
ER  -

APA


Shrivastava, A. & Li, P.. (2014). Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):557-565 Available from https://proceedings.mlr.press/v32/shrivastava14.html.

Related Material

Download PDF