Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Jonathan Hehir, Daniel Ting, Graham Cormode
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:12846-12865, 2023.

Abstract

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-hehir23a, title = {Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting}, author = {Hehir, Jonathan and Ting, Daniel and Cormode, Graham}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {12846--12865}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/hehir23a/hehir23a.pdf}, url = {https://proceedings.mlr.press/v202/hehir23a.html}, abstract = {Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.} }
Endnote
%0 Conference Paper %T Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting %A Jonathan Hehir %A Daniel Ting %A Graham Cormode %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-hehir23a %I PMLR %P 12846--12865 %U https://proceedings.mlr.press/v202/hehir23a.html %V 202 %X Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.
APA
Hehir, J., Ting, D. & Cormode, G.. (2023). Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:12846-12865 Available from https://proceedings.mlr.press/v202/hehir23a.html.

Related Material