[edit]
Priv’IT: Private and Sample Efficient Identity Testing
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:635-644, 2017.
Abstract
We develop differentially private hypothesis testing methods for the small sample regime. Given a sample D from a categorical distribution p over some domain Σ, an explicitly described distribution q over Σ, some privacy parameter ϵ, accuracy parameter α, and requirements βI and βII for the type I and type II errors of our test, the goal is to distinguish between p=q and dtv(p,q)≥α. We provide theoretical bounds for the sample size |D| so that our method both satisfies (ϵ,0)-differential privacy, and guarantees βI and βII type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the χ2-test with noisy counts, or standard approaches such as repetition for endowing non-private χ2-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing.