Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

Jayadev Acharya, Clément L Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:3-40, 2020.

Abstract

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness. A key component of our upper bounds is a new primitive of \textit{domain compression}, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v125-acharya20a, title = {Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit}, author = {Acharya, Jayadev and Canonne, Cl{\'e}ment L and Han, Yanjun and Sun, Ziteng and Tyagi, Himanshu}, booktitle = {Proceedings of Thirty Third Conference on Learning Theory}, pages = {3--40}, year = {2020}, editor = {Abernethy, Jacob and Agarwal, Shivani}, volume = {125}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v125/acharya20a/acharya20a.pdf}, url = {https://proceedings.mlr.press/v125/acharya20a.html}, abstract = { We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness. A key component of our upper bounds is a new primitive of \textit{domain compression}, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness. } }
Endnote
%0 Conference Paper %T Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit %A Jayadev Acharya %A Clément L Canonne %A Yanjun Han %A Ziteng Sun %A Himanshu Tyagi %B Proceedings of Thirty Third Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2020 %E Jacob Abernethy %E Shivani Agarwal %F pmlr-v125-acharya20a %I PMLR %P 3--40 %U https://proceedings.mlr.press/v125/acharya20a.html %V 125 %X We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness. A key component of our upper bounds is a new primitive of \textit{domain compression}, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness.
APA
Acharya, J., Canonne, C.L., Han, Y., Sun, Z. & Tyagi, H.. (2020). Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:3-40 Available from https://proceedings.mlr.press/v125/acharya20a.html.

Related Material