Open Problem: Tight Characterization of Instance-Optimal Identity Testing

Clément Canonne
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:5312-5316, 2024.

Abstract

In the “instance-optimal” identity testing introduced by Valiant and Valiant (2014), one is given the (succinct) description of a discrete probability distribution $q$, as well as a a parameter $\varepsilon\in(0,1]$ and i.i.d. samples from an (unknown, arbitrary) discrete distribution $p$. The goal is to distinguish with high probability between the cases (i) $p=q$ and (ii) $\textrm{TV}(p,q) > \varepsilon$, using the minimum number of samples possible as a function of (some simple functional of) $q$ and $\varepsilon$. This is in contrast with the standard formulation of identity testing, where the sample complexity is taken as worst-case over all possible reference distributions $q$. Valiant and Valiant provided upper and lower bounds on this question, where the sample complexity is expressed in terms of the “$\ell_{2/3}$ norm” of some (truncated version) of the reference distribution $q$. However, these upper and lower bounds do not always match up to constant factors, and can differ by an arbitrary multiplicative gap for some choices of $q$. The question then is: what is the tight characterization of the sample complexity of instance-optimal identity testing? What is the “right” functional $\Phi(q)$ for it?

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-canonne24a, title = {Open Problem: Tight Characterization of Instance-Optimal Identity Testing}, author = {Canonne, Cl\'ement}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {5312--5316}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/canonne24a/canonne24a.pdf}, url = {https://proceedings.mlr.press/v247/canonne24a.html}, abstract = {In the “instance-optimal” identity testing introduced by Valiant and Valiant (2014), one is given the (succinct) description of a discrete probability distribution $q$, as well as a a parameter $\varepsilon\in(0,1]$ and i.i.d. samples from an (unknown, arbitrary) discrete distribution $p$. The goal is to distinguish with high probability between the cases (i) $p=q$ and (ii) $\textrm{TV}(p,q) > \varepsilon$, using the minimum number of samples possible as a function of (some simple functional of) $q$ and $\varepsilon$. This is in contrast with the standard formulation of identity testing, where the sample complexity is taken as worst-case over all possible reference distributions $q$. Valiant and Valiant provided upper and lower bounds on this question, where the sample complexity is expressed in terms of the “$\ell_{2/3}$ norm” of some (truncated version) of the reference distribution $q$. However, these upper and lower bounds do not always match up to constant factors, and can differ by an arbitrary multiplicative gap for some choices of $q$. The question then is: what is the tight characterization of the sample complexity of instance-optimal identity testing? What is the “right” functional $\Phi(q)$ for it? } }
Endnote
%0 Conference Paper %T Open Problem: Tight Characterization of Instance-Optimal Identity Testing %A Clément Canonne %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-canonne24a %I PMLR %P 5312--5316 %U https://proceedings.mlr.press/v247/canonne24a.html %V 247 %X In the “instance-optimal” identity testing introduced by Valiant and Valiant (2014), one is given the (succinct) description of a discrete probability distribution $q$, as well as a a parameter $\varepsilon\in(0,1]$ and i.i.d. samples from an (unknown, arbitrary) discrete distribution $p$. The goal is to distinguish with high probability between the cases (i) $p=q$ and (ii) $\textrm{TV}(p,q) > \varepsilon$, using the minimum number of samples possible as a function of (some simple functional of) $q$ and $\varepsilon$. This is in contrast with the standard formulation of identity testing, where the sample complexity is taken as worst-case over all possible reference distributions $q$. Valiant and Valiant provided upper and lower bounds on this question, where the sample complexity is expressed in terms of the “$\ell_{2/3}$ norm” of some (truncated version) of the reference distribution $q$. However, these upper and lower bounds do not always match up to constant factors, and can differ by an arbitrary multiplicative gap for some choices of $q$. The question then is: what is the tight characterization of the sample complexity of instance-optimal identity testing? What is the “right” functional $\Phi(q)$ for it?
APA
Canonne, C.. (2024). Open Problem: Tight Characterization of Instance-Optimal Identity Testing. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:5312-5316 Available from https://proceedings.mlr.press/v247/canonne24a.html.

Related Material