Profile Reconstruction from Private Sketches

Hao Wu, Rasmus Pagh
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:53793-53816, 2024.

Abstract

Given a multiset of $n$ items from $\mathcal{D}$, the profile reconstruction problem is to estimate, for $t = 0, 1, …, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], …, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to “reverse” the noise using an approach of Dwork et al. (ITCS ’10). We show how to speed up the algorithm from polynomial time to $O(d + n \log n)$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d = |\mathcal{D}|$ is $O( 1 / \sqrt{d})$ — we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wu24w, title = {Profile Reconstruction from Private Sketches}, author = {Wu, Hao and Pagh, Rasmus}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {53793--53816}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wu24w/wu24w.pdf}, url = {https://proceedings.mlr.press/v235/wu24w.html}, abstract = {Given a multiset of $n$ items from $\mathcal{D}$, the profile reconstruction problem is to estimate, for $t = 0, 1, …, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], …, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to “reverse” the noise using an approach of Dwork et al. (ITCS ’10). We show how to speed up the algorithm from polynomial time to $O(d + n \log n)$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d = |\mathcal{D}|$ is $O( 1 / \sqrt{d})$ — we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.} }
Endnote
%0 Conference Paper %T Profile Reconstruction from Private Sketches %A Hao Wu %A Rasmus Pagh %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wu24w %I PMLR %P 53793--53816 %U https://proceedings.mlr.press/v235/wu24w.html %V 235 %X Given a multiset of $n$ items from $\mathcal{D}$, the profile reconstruction problem is to estimate, for $t = 0, 1, …, n$, the fraction $\vec{f}[t]$ of items in $\mathcal{D}$ that appear exactly $t$ times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of $\vec{f} = (\vec{f}[0], …, \vec{f}[n])$. Using a histogram privatized using discrete Laplace noise, we show how to “reverse” the noise using an approach of Dwork et al. (ITCS ’10). We show how to speed up the algorithm from polynomial time to $O(d + n \log n)$, and analyze the achievable error in the $\ell_1$, $\ell_2$ and $\ell_\infty$ norms. In all cases the dependency of the error on $d = |\mathcal{D}|$ is $O( 1 / \sqrt{d})$ — we give an information-theoretic lower bound showing that this dependence on $d$ is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee.
APA
Wu, H. & Pagh, R.. (2024). Profile Reconstruction from Private Sketches. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:53793-53816 Available from https://proceedings.mlr.press/v235/wu24w.html.

Related Material