Public Data-Assisted Mirror Descent for Private Model Training

Ehsan Amid; Arun Ganesh; Rajiv Mathews; Swaroop Ramaswamy; Shuang Song; Thomas Steinke; Thomas Steinke; Vinith M Suriyakumar; Om Thakkar; Abhradeep Thakurta

Public Data-Assisted Mirror Descent for Private Model Training

Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Thomas Steinke, Vinith M Suriyakumar, Om Thakkar, Abhradeep Thakurta

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:517-535, 2022.

Abstract

In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples is sufficiently large. We further show that our algorithm has natural “noise stability” properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.

Cite this Paper

BibTeX

@InProceedings{pmlr-v162-amid22a,
  title = 	 {Public Data-Assisted Mirror Descent for Private Model Training},
  author =       {Amid, Ehsan and Ganesh, Arun and Mathews, Rajiv and Ramaswamy, Swaroop and Song, Shuang and Steinke, Thomas and Steinke, Thomas and Suriyakumar, Vinith M and Thakkar, Om and Thakurta, Abhradeep},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {517--535},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/amid22a/amid22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/amid22a.html},
  abstract = 	 {In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples is sufficiently large. We further show that our algorithm has natural “noise stability” properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.}
}

Endnote

%0 Conference Paper
%T Public Data-Assisted Mirror Descent for Private Model Training
%A Ehsan Amid
%A Arun Ganesh
%A Rajiv Mathews
%A Swaroop Ramaswamy
%A Shuang Song
%A Thomas Steinke
%A Thomas Steinke
%A Vinith M Suriyakumar
%A Om Thakkar
%A Abhradeep Thakurta
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-amid22a
%I PMLR
%P 517--535
%U https://proceedings.mlr.press/v162/amid22a.html
%V 162
%X In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples is sufficiently large. We further show that our algorithm has natural “noise stability” properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.

APA

Amid, E., Ganesh, A., Mathews, R., Ramaswamy, S., Song, S., Steinke, T., Steinke, T., Suriyakumar, V.M., Thakkar, O. & Thakurta, A.. (2022). Public Data-Assisted Mirror Descent for Private Model Training. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:517-535 Available from https://proceedings.mlr.press/v162/amid22a.html.

Related Material

Download PDF