Priv’IT: Private and Sample Efficient Identity Testing

Bryan Cai, Constantinos Daskalakis, Gautam Kamath
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:635-644, 2017.

Abstract

We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-cai17a, title = {{P}riv'{IT}: Private and Sample Efficient Identity Testing}, author = {Bryan Cai and Constantinos Daskalakis and Gautam Kamath}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {635--644}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/cai17a/cai17a.pdf}, url = { http://proceedings.mlr.press/v70/cai17a.html }, abstract = {We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.} }
Endnote
%0 Conference Paper %T Priv’IT: Private and Sample Efficient Identity Testing %A Bryan Cai %A Constantinos Daskalakis %A Gautam Kamath %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-cai17a %I PMLR %P 635--644 %U http://proceedings.mlr.press/v70/cai17a.html %V 70 %X We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.
APA
Cai, B., Daskalakis, C. & Kamath, G.. (2017). Priv’IT: Private and Sample Efficient Identity Testing. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:635-644 Available from http://proceedings.mlr.press/v70/cai17a.html .

Related Material