Data Structures for Density Estimation

Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:1-18, 2023.

Abstract

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-aamand23a, title = {Data Structures for Density Estimation}, author = {Aamand, Anders and Andoni, Alexandr and Chen, Justin Y. and Indyk, Piotr and Narayanan, Shyam and Silwal, Sandeep}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {1--18}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/aamand23a/aamand23a.pdf}, url = {https://proceedings.mlr.press/v202/aamand23a.html}, abstract = {We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.} }
Endnote
%0 Conference Paper %T Data Structures for Density Estimation %A Anders Aamand %A Alexandr Andoni %A Justin Y. Chen %A Piotr Indyk %A Shyam Narayanan %A Sandeep Silwal %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-aamand23a %I PMLR %P 1--18 %U https://proceedings.mlr.press/v202/aamand23a.html %V 202 %X We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
APA
Aamand, A., Andoni, A., Chen, J.Y., Indyk, P., Narayanan, S. & Silwal, S.. (2023). Data Structures for Density Estimation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:1-18 Available from https://proceedings.mlr.press/v202/aamand23a.html.

Related Material