[edit]
Asymptotically Optimal Locally Private Heavy Hitters via Parameterized Sketches
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7766-7798, 2022.
Abstract
We study the frequency estimation problem under the local differential privacy model. Frequency estimation is a fundamental computational question, and differential privacy has become the de-facto standard, with the local version (LDP) affording even greater protection. On large input domains, sketching methods and hierarchical search methods are commonly and successfully, in practice, applied for reducing the size of the domain, and for identifying frequent elements. It is therefore of interest whether the current theoretical analysis of such algorithms is tight, or whether we can obtain algorithms in a similar vein that achieve optimal error guarantee. We introduce two algorithms for LDP frequency estimation. One solves the fundamental frequency oracle problem; the other solves the well-known heavy hitters identification problem. As a function of failure probability, \ensuremath{\beta}, the former achieves optimal worst-case estimation error for every \ensuremath{\beta}; the latter is optimal when \ensuremath{\beta} is at least inverse polynomial in n, the number of users. In each algorithm, server running time and memory usage are tilde{O}(n) and tilde{O}(sqrt{n}), respectively, while user running time and memory usage are both tilde{O}(1). Our frequency-oracle algorithm achieves lower estimation error than Bassily et al. (NeurIPS 2017). On the other hand, our heavy hitters identification method improves the worst-case error of TreeHist (ibid) by a factor of Omega(sqrt{log n}); it avoids invoking error-correcting codes, known to be theoretically powerful, but yet to be implemented.