On The Memory Complexity of Uniformity Testing

Tomer Berg, Or Ordentlich, Ofer Shayevitz
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:3506-3523, 2022.

Abstract

In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\eps$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<\delta<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $\delta$, as a function of $n$ and $\eps$, where our upper bound is $O(\frac{n\log n}{\eps})$ and our lower bound is $\Omega (n+\frac{1}{\eps})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard, thus different proof techniques are needed in order to attain our bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-berg22a, title = {On The Memory Complexity of Uniformity Testing}, author = {Berg, Tomer and Ordentlich, Or and Shayevitz, Ofer}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {3506--3523}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/berg22a/berg22a.pdf}, url = {https://proceedings.mlr.press/v178/berg22a.html}, abstract = {In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\eps$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<\delta<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $\delta$, as a function of $n$ and $\eps$, where our upper bound is $O(\frac{n\log n}{\eps})$ and our lower bound is $\Omega (n+\frac{1}{\eps})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard, thus different proof techniques are needed in order to attain our bounds.} }
Endnote
%0 Conference Paper %T On The Memory Complexity of Uniformity Testing %A Tomer Berg %A Or Ordentlich %A Ofer Shayevitz %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-berg22a %I PMLR %P 3506--3523 %U https://proceedings.mlr.press/v178/berg22a.html %V 178 %X In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\eps$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<\delta<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $\delta$, as a function of $n$ and $\eps$, where our upper bound is $O(\frac{n\log n}{\eps})$ and our lower bound is $\Omega (n+\frac{1}{\eps})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard, thus different proof techniques are needed in order to attain our bounds.
APA
Berg, T., Ordentlich, O. & Shayevitz, O.. (2022). On The Memory Complexity of Uniformity Testing. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:3506-3523 Available from https://proceedings.mlr.press/v178/berg22a.html.

Related Material