Data Structures for Density Estimation

Anders Aamand; Alexandr Andoni; Justin Y. Chen; Piotr Indyk; Shyam Narayanan; Sandeep Silwal

Data Structures for Density Estimation

Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1-18, 2023.

Abstract

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-aamand23a,
  title = 	 {Data Structures for Density Estimation},
  author =       {Aamand, Anders and Andoni, Alexandr and Chen, Justin Y. and Indyk, Piotr and Narayanan, Shyam and Silwal, Sandeep},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {1--18},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/aamand23a/aamand23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/aamand23a.html},
  abstract = 	 {We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.}
}

Endnote

%0 Conference Paper
%T Data Structures for Density Estimation
%A Anders Aamand
%A Alexandr Andoni
%A Justin Y. Chen
%A Piotr Indyk
%A Shyam Narayanan
%A Sandeep Silwal
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-aamand23a
%I PMLR
%P 1--18
%U https://proceedings.mlr.press/v202/aamand23a.html
%V 202
%X We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

APA


Aamand, A., Andoni, A., Chen, J.Y., Indyk, P., Narayanan, S. & Silwal, S.. (2023). Data Structures for Density Estimation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1-18 Available from https://proceedings.mlr.press/v202/aamand23a.html.

Data Structures for Density Estimation

Abstract

Cite this Paper

Related Material