Multi-Task Differential Privacy Under Distribution Skew

Walid Krichene; Prateek Jain; Shuang Song; Mukund Sundararajan; Abhradeep Guha Thakurta; Li Zhang

Multi-Task Differential Privacy Under Distribution Skew

Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Guha Thakurta, Li Zhang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:17784-17807, 2023.

Abstract

We study the problem of multi-task learning under user-level differential privacy, in which n users contribute data to m tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Tasks that have much fewer data samples than others are more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-krichene23a,
  title = 	 {Multi-Task Differential Privacy Under Distribution Skew},
  author =       {Krichene, Walid and Jain, Prateek and Song, Shuang and Sundararajan, Mukund and Guha Thakurta, Abhradeep and Zhang, Li},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {17784--17807},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/krichene23a/krichene23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/krichene23a.html},
  abstract = 	 {We study the problem of multi-task learning under user-level differential privacy, in which n users contribute data to m tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Tasks that have much fewer data samples than others are more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.}
}

Endnote

%0 Conference Paper
%T Multi-Task Differential Privacy Under Distribution Skew
%A Walid Krichene
%A Prateek Jain
%A Shuang Song
%A Mukund Sundararajan
%A Abhradeep Guha Thakurta
%A Li Zhang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-krichene23a
%I PMLR
%P 17784--17807
%U https://proceedings.mlr.press/v202/krichene23a.html
%V 202
%X We study the problem of multi-task learning under user-level differential privacy, in which n users contribute data to m tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Tasks that have much fewer data samples than others are more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.

APA


Krichene, W., Jain, P., Song, S., Sundararajan, M., Guha Thakurta, A. & Zhang, L.. (2023). Multi-Task Differential Privacy Under Distribution Skew. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:17784-17807 Available from https://proceedings.mlr.press/v202/krichene23a.html.

Multi-Task Differential Privacy Under Distribution Skew

Abstract

Cite this Paper

Related Material