[edit]
Closeness testing from distributed measurements
Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-23, 2026.
Abstract
We consider the fundamental task of two-sample composite hypothesis testing (i.e., "closeness testing") in a distributed setting, where a central party holds a dataset of $M \gg 1$ observations from an unknown discrete probability distribution $q$ over a universe of size $k$, and individual parties each independently observes one realization from an unknown distribution $p$. The goal is for the central party to test whether $p$ and $q$ are equal, or differ significantly in statistical distance, while only receiving a small amount of information (at most $\ell \leq \log_2 k$ bits) from each of the $n$ distributed entities. Our main contribution is a time- and sample-efficient algorithm for this task, applicable across the whole regime of parameters. Our theoretical guarantees match the optimal sample complexities in the specific cases already studied in the literature, e.g., when $\ell = \log_2 k$ (no information constraint) or $M \to \infty$ (reference distribution fully known to the central party).