Distance Estimation for High-Dimensional Discrete Distributions

Kuldeep S. Meel, Gunjan Kumar, Yash Pote
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:955-963, 2025.

Abstract

Given two distributions $\mathcal{P}$ and $\mathcal{Q}$ over a high-dimensional domain $\{0,1\}^n$, and a parameter $\varepsilon$, the goal of distance estimation is to determine the statistical distance between $\mathcal{P}$ and $\mathcal{Q}$, up to an additive tolerance $\pm \varepsilon$. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on richer query models where one can draw conditional samples. This paper presents the first polynomial query distance estimator in the conditional sampling model ($\mathsf{COND}$). We base our algorithm on the relatively weaker \textit{subcube conditional} sampling ($\mathsf{SUBCOND}$) oracle, which draws samples from the distribution conditioned on some of the dimensions. $\mathsf{SUBCOND}$ is a promising model for widespread practical use because it captures the natural behavior of discrete samplers. Our algorithm makes $\tilde{\mathcal{O}}(n^3/\varepsilon^5)$ queries to $\mathsf{SUBCOND}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-meel25a, title = {Distance Estimation for High-Dimensional Discrete Distributions}, author = {Meel, Kuldeep S. and Kumar, Gunjan and Pote, Yash}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {955--963}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/meel25a/meel25a.pdf}, url = {https://proceedings.mlr.press/v258/meel25a.html}, abstract = {Given two distributions $\mathcal{P}$ and $\mathcal{Q}$ over a high-dimensional domain $\{0,1\}^n$, and a parameter $\varepsilon$, the goal of distance estimation is to determine the statistical distance between $\mathcal{P}$ and $\mathcal{Q}$, up to an additive tolerance $\pm \varepsilon$. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on richer query models where one can draw conditional samples. This paper presents the first polynomial query distance estimator in the conditional sampling model ($\mathsf{COND}$). We base our algorithm on the relatively weaker \textit{subcube conditional} sampling ($\mathsf{SUBCOND}$) oracle, which draws samples from the distribution conditioned on some of the dimensions. $\mathsf{SUBCOND}$ is a promising model for widespread practical use because it captures the natural behavior of discrete samplers. Our algorithm makes $\tilde{\mathcal{O}}(n^3/\varepsilon^5)$ queries to $\mathsf{SUBCOND}$.} }
Endnote
%0 Conference Paper %T Distance Estimation for High-Dimensional Discrete Distributions %A Kuldeep S. Meel %A Gunjan Kumar %A Yash Pote %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-meel25a %I PMLR %P 955--963 %U https://proceedings.mlr.press/v258/meel25a.html %V 258 %X Given two distributions $\mathcal{P}$ and $\mathcal{Q}$ over a high-dimensional domain $\{0,1\}^n$, and a parameter $\varepsilon$, the goal of distance estimation is to determine the statistical distance between $\mathcal{P}$ and $\mathcal{Q}$, up to an additive tolerance $\pm \varepsilon$. Since exponential lower bounds (in $n$) are known for the problem in the standard sampling model, research has focused on richer query models where one can draw conditional samples. This paper presents the first polynomial query distance estimator in the conditional sampling model ($\mathsf{COND}$). We base our algorithm on the relatively weaker \textit{subcube conditional} sampling ($\mathsf{SUBCOND}$) oracle, which draws samples from the distribution conditioned on some of the dimensions. $\mathsf{SUBCOND}$ is a promising model for widespread practical use because it captures the natural behavior of discrete samplers. Our algorithm makes $\tilde{\mathcal{O}}(n^3/\varepsilon^5)$ queries to $\mathsf{SUBCOND}$.
APA
Meel, K.S., Kumar, G. & Pote, Y.. (2025). Distance Estimation for High-Dimensional Discrete Distributions. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:955-963 Available from https://proceedings.mlr.press/v258/meel25a.html.

Related Material