Single versus Multiple Sorting in All Pairs Similarity Search

Yasuo Tabei; Takeaki Uno; Masashi Sugiyama; Koji Tsuda

Single versus Multiple Sorting in All Pairs Similarity Search

Yasuo Tabei, Takeaki Uno, Masashi Sugiyama, Koji Tsuda

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:145-160, 2010.

Abstract

To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Chariker (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call 'single sorting', is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v13-tabei10a,
  title = 	 {Single versus Multiple Sorting in All Pairs Similarity Search},
  author = 	 {Tabei, Yasuo and Uno, Takeaki and Sugiyama, Masashi and Tsuda, Koji},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {145--160},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/tabei10a/tabei10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/tabei10a.html},
  abstract = 	 {To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Chariker (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call 'single sorting', is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.}
}

Endnote

%0 Conference Paper
%T Single versus Multiple Sorting in All Pairs Similarity Search
%A Yasuo Tabei
%A Takeaki Uno
%A Masashi Sugiyama
%A Koji Tsuda
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-tabei10a
%I PMLR
%P 145--160
%U https://proceedings.mlr.press/v13/tabei10a.html
%V 13
%X To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Chariker (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call 'single sorting', is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.

RIS


TY  - CPAPER
TI  - Single versus Multiple Sorting in All Pairs Similarity Search
AU  - Yasuo Tabei
AU  - Takeaki Uno
AU  - Masashi Sugiyama
AU  - Koji Tsuda
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-tabei10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 145
EP  - 160
L1  - http://proceedings.mlr.press/v13/tabei10a/tabei10a.pdf
UR  - https://proceedings.mlr.press/v13/tabei10a.html
AB  - To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Chariker (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call 'single sorting', is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.
ER  -

APA


Tabei, Y., Uno, T., Sugiyama, M. & Tsuda, K.. (2010). Single versus Multiple Sorting in All Pairs Similarity Search. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:145-160 Available from https://proceedings.mlr.press/v13/tabei10a.html.

Related Material

Download PDF