Sharp Constants in Uniformity Testing via the Huber Statistic

Shivam Gupta, Eric Price
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:3113-3192, 2022.

Abstract

Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $\eps$-far distribution with $1-\delta$ probability is $n = \Theta(\frac{\sqrt{m \log (1/\delta)}}{\eps^2} + \frac{\log (1/\delta)}{\eps^2})$, which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of $(1 + o(1))\frac{\sqrt{m \log (1/\delta)}}{\eps^2}$ in the regime where this term is dominant, unlike all other existing testers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-gupta22a, title = {Sharp Constants in Uniformity Testing via the Huber Statistic}, author = {Gupta, Shivam and Price, Eric}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {3113--3192}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/gupta22a/gupta22a.pdf}, url = {https://proceedings.mlr.press/v178/gupta22a.html}, abstract = {Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $\eps$-far distribution with $1-\delta$ probability is $n = \Theta(\frac{\sqrt{m \log (1/\delta)}}{\eps^2} + \frac{\log (1/\delta)}{\eps^2})$, which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of $(1 + o(1))\frac{\sqrt{m \log (1/\delta)}}{\eps^2}$ in the regime where this term is dominant, unlike all other existing testers.} }
Endnote
%0 Conference Paper %T Sharp Constants in Uniformity Testing via the Huber Statistic %A Shivam Gupta %A Eric Price %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-gupta22a %I PMLR %P 3113--3192 %U https://proceedings.mlr.press/v178/gupta22a.html %V 178 %X Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $\eps$-far distribution with $1-\delta$ probability is $n = \Theta(\frac{\sqrt{m \log (1/\delta)}}{\eps^2} + \frac{\log (1/\delta)}{\eps^2})$, which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of $(1 + o(1))\frac{\sqrt{m \log (1/\delta)}}{\eps^2}$ in the regime where this term is dominant, unlike all other existing testers.
APA
Gupta, S. & Price, E.. (2022). Sharp Constants in Uniformity Testing via the Huber Statistic. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:3113-3192 Available from https://proceedings.mlr.press/v178/gupta22a.html.

Related Material