Algorithmically Effective Differentially Private Synthetic Data

Yiyun He, Roman Vershynin, Yizhe Zhu
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3941-3968, 2023.

Abstract

We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-he23a, title = {Algorithmically Effective Differentially Private Synthetic Data}, author = {He, Yiyun and Vershynin, Roman and Zhu, Yizhe}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {3941--3968}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/he23a/he23a.pdf}, url = {https://proceedings.mlr.press/v195/he23a.html}, abstract = {We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.} }
Endnote
%0 Conference Paper %T Algorithmically Effective Differentially Private Synthetic Data %A Yiyun He %A Roman Vershynin %A Yizhe Zhu %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-he23a %I PMLR %P 3941--3968 %U https://proceedings.mlr.press/v195/he23a.html %V 195 %X We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.
APA
He, Y., Vershynin, R. & Zhu, Y.. (2023). Algorithmically Effective Differentially Private Synthetic Data. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3941-3968 Available from https://proceedings.mlr.press/v195/he23a.html.

Related Material