[edit]
Improving Hashing Algorithms for Similarity Search via MLE and the Control Variates Trick
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:814-829, 2021.
Abstract
Hashing algorithms are continually used for large-scale learning and similarity search, with computationally cheap and better algorithms being proposed every year. In this paper we focus on hashing algorithms which involve estimating a distance measure d(→xi,→xj) between two vectors →xi,→xj. Such hashing algorithms require generation of random variables, and we propose two approaches to reduce the variance of our hashed estimates: control variates and maximum likelihood estimates. We explain how these approaches can be immediately applied to a wide subset of hashing algorithms. Further, we evaluate the impact of these methods on various datasets. We finally run empirical simulations to verify our results.