Priv’IT: Private and Sample Efficient Identity Testing

Bryan Cai; Constantinos Daskalakis; Gautam Kamath

Priv’IT: Private and Sample Efficient Identity Testing

Bryan Cai, Constantinos Daskalakis, Gautam Kamath

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:635-644, 2017.

Abstract

We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-cai17a,
  title = 	 {{P}riv'{IT}: Private and Sample Efficient Identity Testing},
  author =       {Bryan Cai and Constantinos Daskalakis and Gautam Kamath},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {635--644},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/cai17a/cai17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/cai17a.html},
  abstract = 	 {We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.}
}

Endnote

%0 Conference Paper
%T Priv’IT: Private and Sample Efficient Identity Testing
%A Bryan Cai
%A Constantinos Daskalakis
%A Gautam Kamath
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-cai17a
%I PMLR
%P 635--644
%U https://proceedings.mlr.press/v70/cai17a.html
%V 70
%X We develop differentially private hypothesis testing methods for the small sample regime. Given a sample $\mathcal{D}$ from a categorical distribution $p$ over some domain $\Sigma$, an explicitly described distribution $q$ over $\Sigma$, some privacy parameter $\epsilon$, accuracy parameter $\alpha$, and requirements $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ for the type I and type II errors of our test, the goal is to distinguish between $p=q$ and $d_\mathrm{tv}(p,q) \ge \alpha$. We provide theoretical bounds for the sample size $|\mathcal{D}|$ so that our method both satisfies $(\epsilon,0)$-differential privacy, and guarantees $\beta_\mathrm{I}$ and $\beta_\mathrm{II}$ type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the $\chi^2$-test with noisy counts, or standard approaches such as repetition for endowing non-private $\chi^2$-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.

APA


Cai, B., Daskalakis, C. & Kamath, G.. (2017). Priv’IT: Private and Sample Efficient Identity Testing. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:635-644 Available from https://proceedings.mlr.press/v70/cai17a.html.

Priv’IT: Private and Sample Efficient Identity Testing

Abstract

Cite this Paper

Related Material