Dual Query: Practical Private Query Release for High Dimensional Data

Marco Gaboardi; Emilio Jesus Gallego Arias; Justin Hsu; Aaron Roth; Zhiwei Steven Wu

Dual Query: Practical Private Query Release for High Dimensional Data

Marco Gaboardi, Emilio Jesus Gallego Arias, Justin Hsu, Aaron Roth, Zhiwei Steven Wu

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1170-1178, 2014.

Abstract

We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-gaboardi14,
  title = 	 {Dual Query: Practical Private Query Release for High Dimensional Data},
  author = 	 {Gaboardi, Marco and Arias, Emilio Jesus Gallego and Hsu, Justin and Roth, Aaron and Wu, Zhiwei Steven},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1170--1178},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/gaboardi14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/gaboardi14.html},
  abstract = 	 {We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example,  our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.}
}

Endnote

%0 Conference Paper
%T Dual Query: Practical Private Query Release for High Dimensional Data
%A Marco Gaboardi
%A Emilio Jesus Gallego Arias
%A Justin Hsu
%A Aaron Roth
%A Zhiwei Steven Wu
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-gaboardi14
%I PMLR
%P 1170--1178
%U https://proceedings.mlr.press/v32/gaboardi14.html
%V 32
%N 2
%X We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example,  our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.

RIS


TY  - CPAPER
TI  - Dual Query: Practical Private Query Release for High Dimensional Data
AU  - Marco Gaboardi
AU  - Emilio Jesus Gallego Arias
AU  - Justin Hsu
AU  - Aaron Roth
AU  - Zhiwei Steven Wu
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-gaboardi14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1170
EP  - 1178
L1  - http://proceedings.mlr.press/v32/gaboardi14.pdf
UR  - https://proceedings.mlr.press/v32/gaboardi14.html
AB  - We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved non-privately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example,  our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17,000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.
ER  -

APA


Gaboardi, M., Arias, E.J.G., Hsu, J., Roth, A. & Wu, Z.S.. (2014). Dual Query: Practical Private Query Release for High Dimensional Data. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1170-1178 Available from https://proceedings.mlr.press/v32/gaboardi14.html.

Related Material

Download PDF