Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:11998-12011, 2023.
Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $\epsilon$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. For a learning algorithm satisfying $(\alpha, \epsilon)$-Renyi DP, we utilize different variants of the celebrated Fano’s inequality to upper bound the attack advantage of a data reconstruction adversary. Our bound can be numerically computed to relate the parameter $\epsilon$ to the desired level of privacy protection in practice, and complements the empirical evidence for the effectiveness of DP against data reconstruction attacks even at relatively large values of $\epsilon$.