Improving sign-random-projection via count sketch

Punit Pankaj Dubey, Bhisham Dev Verma, Rameshwar Pratap, Keegan Kang
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:599-609, 2022.

Abstract

Computing the angular similarity between pairs of vectors is a core part of various machine learning algorithms. The seminal work of Charikar (a.k.a. Sign-Random-Projection (SRP) or SimHash) provides an unbiased estimate for the same. However, SRP suffers from the following limitations: (i) large variance in the similarity estimation, (ii) and high running time while computing the sketch. There are improved variants that address these limitations. However, they are known to improve on only one aspect in their proposal, for e.g. Yu et al. suggest a faster algorithm, Ji et al., Kang and Wong, provide estimates with a smaller variance. In this work, we propose a sketching algorithm that addresses both aspects in one algorithm – a faster algorithm along with a smaller variance in the similarity estimation. Moreover, our algorithm is space-efficient as well. We present a rigorous theoretical analysis of our proposal and complement it via experiments on synthetic and real-world datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-dubey22a, title = {Improving sign-random-projection via count sketch}, author = {Dubey, Punit Pankaj and Verma, Bhisham Dev and Pratap, Rameshwar and Kang, Keegan}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {599--609}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/dubey22a/dubey22a.pdf}, url = {https://proceedings.mlr.press/v180/dubey22a.html}, abstract = {Computing the angular similarity between pairs of vectors is a core part of various machine learning algorithms. The seminal work of Charikar (a.k.a. Sign-Random-Projection (SRP) or SimHash) provides an unbiased estimate for the same. However, SRP suffers from the following limitations: (i) large variance in the similarity estimation, (ii) and high running time while computing the sketch. There are improved variants that address these limitations. However, they are known to improve on only one aspect in their proposal, for e.g. Yu et al. suggest a faster algorithm, Ji et al., Kang and Wong, provide estimates with a smaller variance. In this work, we propose a sketching algorithm that addresses both aspects in one algorithm – a faster algorithm along with a smaller variance in the similarity estimation. Moreover, our algorithm is space-efficient as well. We present a rigorous theoretical analysis of our proposal and complement it via experiments on synthetic and real-world datasets. } }
Endnote
%0 Conference Paper %T Improving sign-random-projection via count sketch %A Punit Pankaj Dubey %A Bhisham Dev Verma %A Rameshwar Pratap %A Keegan Kang %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-dubey22a %I PMLR %P 599--609 %U https://proceedings.mlr.press/v180/dubey22a.html %V 180 %X Computing the angular similarity between pairs of vectors is a core part of various machine learning algorithms. The seminal work of Charikar (a.k.a. Sign-Random-Projection (SRP) or SimHash) provides an unbiased estimate for the same. However, SRP suffers from the following limitations: (i) large variance in the similarity estimation, (ii) and high running time while computing the sketch. There are improved variants that address these limitations. However, they are known to improve on only one aspect in their proposal, for e.g. Yu et al. suggest a faster algorithm, Ji et al., Kang and Wong, provide estimates with a smaller variance. In this work, we propose a sketching algorithm that addresses both aspects in one algorithm – a faster algorithm along with a smaller variance in the similarity estimation. Moreover, our algorithm is space-efficient as well. We present a rigorous theoretical analysis of our proposal and complement it via experiments on synthetic and real-world datasets.
APA
Dubey, P.P., Verma, B.D., Pratap, R. & Kang, K.. (2022). Improving sign-random-projection via count sketch. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:599-609 Available from https://proceedings.mlr.press/v180/dubey22a.html.

Related Material