Algorithmically Effective Differentially Private Synthetic Data

Yiyun He; Roman Vershynin; Yizhe Zhu

Algorithmically Effective Differentially Private Synthetic Data

Yiyun He, Roman Vershynin, Yizhe Zhu

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:3941-3968, 2023.

Abstract

We present a highly effective algorithmic approach for generating

$\varepsilon$ -differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the 1-Wasserstein distance. In particular, for a dataset

$\mathcal X$ in the hypercube

$[0,1]^d$ , our algorithm generates synthetic dataset

$\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of

$\mathcal X$ and

$\mathcal Y$ is

$O((\varepsilon n)^{-1/d})$ for

$d\geq 2$ , and is

$O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for

$d=1$ . The accuracy guarantee is optimal up to a constant factor for

$d\geq 2$ , and up to a logarithmic factor for

$d=1$ . Our algorithm has a fast running time of

$O(\varepsilon d n)$ for all

$d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for

$d\geq 2$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-he23a,
  title = 	 {Algorithmically Effective Differentially Private Synthetic Data},
  author =       {He, Yiyun and Vershynin, Roman and Zhu, Yizhe},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {3941--3968},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/he23a/he23a.pdf},
  url = 	 {https://proceedings.mlr.press/v195/he23a.html},
  abstract = 	 {We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the  1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.}
}

Endnote

%0 Conference Paper
%T Algorithmically Effective Differentially Private Synthetic Data
%A Yiyun He
%A Roman Vershynin
%A Yizhe Zhu
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-he23a
%I PMLR
%P 3941--3968
%U https://proceedings.mlr.press/v195/he23a.html
%V 195
%X We present a highly effective algorithmic approach for generating $\varepsilon$-differentially private synthetic data in a bounded metric space with near-optimal utility guarantees under the  1-Wasserstein distance. In particular, for a dataset $\mathcal X$ in the hypercube $[0,1]^d$, our algorithm generates synthetic dataset $\mathcal Y$ such that the expected 1-Wasserstein distance between the empirical measure of $\mathcal X$ and $\mathcal Y$ is $O((\varepsilon n)^{-1/d})$ for $d\geq 2$, and is $O(\log^2(\varepsilon n)(\varepsilon n)^{-1})$ for $d=1$. The accuracy guarantee is optimal up to a constant factor for $d\geq 2$, and up to a logarithmic factor for $d=1$. Our algorithm has a fast running time of $O(\varepsilon d n)$ for all $d\geq 1$ and demonstrates improved accuracy compared to the method in Boedihardjo et al. (2022) for $d\geq 2$.

APA


He, Y., Vershynin, R. & Zhu, Y.. (2023). Algorithmically Effective Differentially Private Synthetic Data. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:3941-3968 Available from https://proceedings.mlr.press/v195/he23a.html.

Related Material

Download PDF