Local Density Estimation in High Dimensions

Xian Wu; Moses Charikar; Vishnu Natchu

Local Density Estimation in High Dimensions

Xian Wu, Moses Charikar, Vishnu Natchu

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5296-5305, 2018.

Abstract

An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via an unbiased estimator that uses importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and query complexity of our scheme, and demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-wu18a,
  title = 	 {Local Density Estimation in High Dimensions},
  author =       {Wu, Xian and Charikar, Moses and Natchu, Vishnu},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5296--5305},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/wu18a/wu18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/wu18a.html},
  abstract = 	 {An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via an unbiased estimator that uses importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and query complexity of our scheme, and demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.}
}

Endnote

%0 Conference Paper
%T Local Density Estimation in High Dimensions
%A Xian Wu
%A Moses Charikar
%A Vishnu Natchu
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-wu18a
%I PMLR
%P 5296--5305
%U https://proceedings.mlr.press/v80/wu18a.html
%V 80
%X An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via an unbiased estimator that uses importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and query complexity of our scheme, and demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.

APA

Wu, X., Charikar, M. & Natchu, V.. (2018). Local Density Estimation in High Dimensions. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5296-5305 Available from https://proceedings.mlr.press/v80/wu18a.html.

Local Density Estimation in High Dimensions

Abstract

Cite this Paper

Related Material