The communication cost of security and privacy in federated frequency estimation
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:4247-4274, 2023.
We consider the federated frequency estimation problem, where each user holds a private item $X_i$ from a size-$d$ domain and a server aims to estimate the empirical frequency (i.e., histogram) of $n$ items with $n \ll d$. Without any security and privacy considerations, each user can communicate its item to the server by using $\log d$ bits. A naive application of secure aggregation protocols would, however, require $d\log n$ bits per user. Can we reduce the communication needed for secure aggregation, and does security come with a fundamental cost in communication? In this paper, we develop an information-theoretic model for secure aggregation that allows us to characterize the fundamental cost of security and privacy in terms of communication. We show that with security (and without privacy) $\Omega\left( n \log d \right)$ bits per user are necessary and sufficient to allow the server to compute the frequency distribution. This is significantly smaller than the $d\log n$ bits per user needed by the naive scheme but significantly higher than the $\log d$ bits per user needed without security. To achieve differential privacy, we construct a linear scheme based on a noisy sketch that locally perturbs the data and does not require a trusted server (a.k.a. distributed differential privacy). We analyze this scheme under $\ell_2$ and $\ell_\infty$ loss. By using our information-theoretic framework, we show that the scheme achieves the optimal accuracy-privacy trade-off with optimal communication cost, while matching the performance in the centralized case where data is stored in the central server.