Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick

Keegan Kang, Sergey Kushnarev, Wei Pin Wong, Rameshwar Pratap, Haikal Yeo, Chen Yijia
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:814-829, 2021.

Abstract

Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-kang21a, title = {Improving Hashing Algorithms for Similarity Search \textit{via} MLE and the Control Variates Trick}, author = {Kang, Keegan and Kushnarev, Sergey and Wong, {Wei Pin} and Pratap, Rameshwar and Yeo, Haikal and Yijia, Chen}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {814--829}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/kang21a/kang21a.pdf}, url = {https://proceedings.mlr.press/v157/kang21a.html}, abstract = {Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.} }
Endnote
%0 Conference Paper %T Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick %A Keegan Kang %A Sergey Kushnarev %A Wei Pin Wong %A Rameshwar Pratap %A Haikal Yeo %A Chen Yijia %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-kang21a %I PMLR %P 814--829 %U https://proceedings.mlr.press/v157/kang21a.html %V 157 %X Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure $d(\vec{x}_i,\vec{x}_j)$ between two vectors $\vec{x}_i, \vec{x}_j$. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.
APA
Kang, K., Kushnarev, S., Wong, W.P., Pratap, R., Yeo, H. & Yijia, C.. (2021). Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:814-829 Available from https://proceedings.mlr.press/v157/kang21a.html.

Related Material