New Oracle-Efficient Algorithms for Private Synthetic Data Release

Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Steven Wu
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9765-9774, 2020.

Abstract

We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism (McKenna et al. VLDB 2018), our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\eps$).

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-vietri20b, title = {New Oracle-Efficient Algorithms for Private Synthetic Data Release}, author = {Vietri, Giuseppe and Tian, Grace and Bun, Mark and Steinke, Thomas and Wu, Steven}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9765--9774}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/vietri20b/vietri20b.pdf}, url = {https://proceedings.mlr.press/v119/vietri20b.html}, abstract = {We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism (McKenna et al. VLDB 2018), our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\eps$).} }
Endnote
%0 Conference Paper %T New Oracle-Efficient Algorithms for Private Synthetic Data Release %A Giuseppe Vietri %A Grace Tian %A Mark Bun %A Thomas Steinke %A Steven Wu %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-vietri20b %I PMLR %P 9765--9774 %U https://proceedings.mlr.press/v119/vietri20b.html %V 119 %X We present three new algorithms for constructing differentially private synthetic data—a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle’s optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism (McKenna et al. VLDB 2018), our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\eps$).
APA
Vietri, G., Tian, G., Bun, M., Steinke, T. & Wu, S.. (2020). New Oracle-Efficient Algorithms for Private Synthetic Data Release. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9765-9774 Available from https://proceedings.mlr.press/v119/vietri20b.html.

Related Material