Inference under Information Constraints: Lower Bounds from Chi-Square Contraction

[edit]

Jayadev Acharya, Clément L Canonne, Himanshu Tyagi ;
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:3-17, 2019.

Abstract

Multiple users getting one sample each from an unknown distribution seek to enable a central server to conduct statistical inference. However, each player can only provide limited amount of information about its sample to the server. We propose a unified framework to study such distributed inference problems under local information constraints. We model the local information constraints by a set of channels $\mathcal{W}$: each player chooses a channel from $\mathcal{W}$, and then passes their data through this channel before transmitting the output to the server. The goal in this distributed setting is to understand the blow-up in data requirement imposed by the information constraints, compared to the centralized setting where all data samples are available to the server. We introduce two notions of \emph{chi-square fluctuations} which provide bounds for the average distance and the distance to the average of a local perturbation. When information constraints are imposed, by the standard data-processing inequality, pairwise distances contract and so do our chi-square fluctuations. We provide a precise characterization of this contraction for discrete $k$-ary distributions and use it to obtain to general lower bounds for distribution learning and testing under information constraints. Our results involve notions of minmax and maxmin chi-square fluctuations, where the maximum is over the choice of channels and the minimum is over perturbations. The former emerges when considering public-coin protocols for testing and is bounded in terms of Frobenius norm of a positive semidefinite matrix $H$, a function of the channel family $\mathcal{W}$. The latter appears for private-coin protocols and is bounded by the nuclear norm of $H$ which is smaller than its Frobenius norm, establishing a separation between the sample complexity of testing using public and private coins.

Related Material