Linear Queries Estimation with Local Differential Privacy

Raef Bassily
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:721-729, 2019.

Abstract

We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and d-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n < d^2/\log(J)$, our algorithms attain $L_2$ estimation error that is independent of d. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n < J^2/\log(J)$. Our bounds are within a factor of at most $(\log(J))^{1/4}$ from the optimal $L_2$ error. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over d rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{\infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{\infty}$ error in the offline version of this problem [Duchi et al. 2013].

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-bassily19a, title = {Linear Queries Estimation with Local Differential Privacy}, author = {Bassily, Raef}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {721--729}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/bassily19a/bassily19a.pdf}, url = {https://proceedings.mlr.press/v89/bassily19a.html}, abstract = {We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and d-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n < d^2/\log(J)$, our algorithms attain $L_2$ estimation error that is independent of d. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n < J^2/\log(J)$. Our bounds are within a factor of at most $(\log(J))^{1/4}$ from the optimal $L_2$ error. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over d rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{\infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{\infty}$ error in the offline version of this problem [Duchi et al. 2013].} }
Endnote
%0 Conference Paper %T Linear Queries Estimation with Local Differential Privacy %A Raef Bassily %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-bassily19a %I PMLR %P 721--729 %U https://proceedings.mlr.press/v89/bassily19a.html %V 89 %X We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and d-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n < d^2/\log(J)$, our algorithms attain $L_2$ estimation error that is independent of d. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n < J^2/\log(J)$. Our bounds are within a factor of at most $(\log(J))^{1/4}$ from the optimal $L_2$ error. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over d rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{\infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{\infty}$ error in the offline version of this problem [Duchi et al. 2013].
APA
Bassily, R.. (2019). Linear Queries Estimation with Local Differential Privacy. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:721-729 Available from https://proceedings.mlr.press/v89/bassily19a.html.

Related Material