Closeness testing from distributed measurements

Clement Louis Canonne, Aditya Vikram Singh
Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-23, 2026.

Abstract

We consider the fundamental task of two-sample composite hypothesis testing (i.e., "closeness testing") in a distributed setting, where a central party holds a dataset of $M \gg 1$ observations from an unknown discrete probability distribution $q$ over a universe of size $k$, and individual parties each independently observes one realization from an unknown distribution $p$. The goal is for the central party to test whether $p$ and $q$ are equal, or differ significantly in statistical distance, while only receiving a small amount of information (at most $\ell \leq \log_2 k$ bits) from each of the $n$ distributed entities. Our main contribution is a time- and sample-efficient algorithm for this task, applicable across the whole regime of parameters. Our theoretical guarantees match the optimal sample complexities in the specific cases already studied in the literature, e.g., when $\ell = \log_2 k$ (no information constraint) or $M \to \infty$ (reference distribution fully known to the central party).

Cite this Paper


BibTeX
@InProceedings{pmlr-v313-canonne26a, title = {Closeness testing from distributed measurements}, author = {Canonne, Clement Louis and Singh, Aditya Vikram}, booktitle = {Proceedings of The 37th International Conference on Algorithmic Learning Theory}, pages = {1--23}, year = {2026}, editor = {Telgarsky, Matus and Ullman, Jonathan}, volume = {313}, series = {Proceedings of Machine Learning Research}, month = {23--26 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v313/main/assets/canonne26a/canonne26a.pdf}, url = {https://proceedings.mlr.press/v313/canonne26a.html}, abstract = {We consider the fundamental task of two-sample composite hypothesis testing (i.e., "closeness testing") in a distributed setting, where a central party holds a dataset of $M \gg 1$ observations from an unknown discrete probability distribution $q$ over a universe of size $k$, and individual parties each independently observes one realization from an unknown distribution $p$. The goal is for the central party to test whether $p$ and $q$ are equal, or differ significantly in statistical distance, while only receiving a small amount of information (at most $\ell \leq \log_2 k$ bits) from each of the $n$ distributed entities. Our main contribution is a time- and sample-efficient algorithm for this task, applicable across the whole regime of parameters. Our theoretical guarantees match the optimal sample complexities in the specific cases already studied in the literature, e.g., when $\ell = \log_2 k$ (no information constraint) or $M \to \infty$ (reference distribution fully known to the central party).} }
Endnote
%0 Conference Paper %T Closeness testing from distributed measurements %A Clement Louis Canonne %A Aditya Vikram Singh %B Proceedings of The 37th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Matus Telgarsky %E Jonathan Ullman %F pmlr-v313-canonne26a %I PMLR %P 1--23 %U https://proceedings.mlr.press/v313/canonne26a.html %V 313 %X We consider the fundamental task of two-sample composite hypothesis testing (i.e., "closeness testing") in a distributed setting, where a central party holds a dataset of $M \gg 1$ observations from an unknown discrete probability distribution $q$ over a universe of size $k$, and individual parties each independently observes one realization from an unknown distribution $p$. The goal is for the central party to test whether $p$ and $q$ are equal, or differ significantly in statistical distance, while only receiving a small amount of information (at most $\ell \leq \log_2 k$ bits) from each of the $n$ distributed entities. Our main contribution is a time- and sample-efficient algorithm for this task, applicable across the whole regime of parameters. Our theoretical guarantees match the optimal sample complexities in the specific cases already studied in the literature, e.g., when $\ell = \log_2 k$ (no information constraint) or $M \to \infty$ (reference distribution fully known to the central party).
APA
Canonne, C.L. & Singh, A.V.. (2026). Closeness testing from distributed measurements. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-23 Available from https://proceedings.mlr.press/v313/canonne26a.html.

Related Material