InstaHide: Instance-hiding Schemes for Private Distributed Learning

Yangsibo Huang; Zhao Song; Kai Li; Sanjeev Arora

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4507-4518, 2020.

Abstract

How can multiple distributed entities train a shared deep net on their private data while protecting data privacy? This paper introduces InstaHide, a simple encryption of training images. Encrypted images can be used in standard deep learning pipelines (PyTorch, Federated Learning etc.) with no additional setup or infrastructure. The encryption has a minor effect on test accuracy (unlike differential privacy). Encryption consists of mixing the image with a set of other images (in the sense of Mixup data augmentation technique (Zhang et al., 2018)) followed by applying a random pixel-wise mask on the mixed image. Other contributions of this paper are: (a) Use of large public dataset of images (e.g. ImageNet) for mixing during encryption; this improves security. (b) Experiments demonstrating effectiveness in protecting privacy against known attacks while preserving model accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstration that Mixup alone is insecure as (contrary to recent proposals), by showing some efficient attacks. (e) Release of a challenge dataset to allow design of new attacks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-huang20i,
  title = 	 {{I}nsta{H}ide: Instance-hiding Schemes for Private Distributed Learning},
  author =       {Huang, Yangsibo and Song, Zhao and Li, Kai and Arora, Sanjeev},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {4507--4518},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/huang20i/huang20i.pdf},
  url = 	 {https://proceedings.mlr.press/v119/huang20i.html},
  abstract = 	 {How can multiple distributed entities train a shared deep net on their private data while protecting data privacy? This paper introduces InstaHide, a simple encryption of training images. Encrypted images can be used in standard deep learning pipelines (PyTorch, Federated Learning etc.) with no additional setup or infrastructure. The encryption has a minor effect on test accuracy (unlike differential privacy). Encryption consists of mixing the image with a set of other images (in the sense of Mixup data augmentation technique (Zhang et al., 2018)) followed by applying a random pixel-wise mask on the mixed image. Other contributions of this paper are: (a) Use of large public dataset of images (e.g. ImageNet) for mixing during encryption; this improves security. (b) Experiments demonstrating effectiveness in protecting privacy against known attacks while preserving model accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstration that Mixup alone is insecure as (contrary to recent proposals), by showing some efficient attacks. (e) Release of a challenge dataset to allow design of new attacks.}
}

Endnote

%0 Conference Paper
%T InstaHide: Instance-hiding Schemes for Private Distributed Learning
%A Yangsibo Huang
%A Zhao Song
%A Kai Li
%A Sanjeev Arora
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-huang20i
%I PMLR
%P 4507--4518
%U https://proceedings.mlr.press/v119/huang20i.html
%V 119
%X How can multiple distributed entities train a shared deep net on their private data while protecting data privacy? This paper introduces InstaHide, a simple encryption of training images. Encrypted images can be used in standard deep learning pipelines (PyTorch, Federated Learning etc.) with no additional setup or infrastructure. The encryption has a minor effect on test accuracy (unlike differential privacy). Encryption consists of mixing the image with a set of other images (in the sense of Mixup data augmentation technique (Zhang et al., 2018)) followed by applying a random pixel-wise mask on the mixed image. Other contributions of this paper are: (a) Use of large public dataset of images (e.g. ImageNet) for mixing during encryption; this improves security. (b) Experiments demonstrating effectiveness in protecting privacy against known attacks while preserving model accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstration that Mixup alone is insecure as (contrary to recent proposals), by showing some efficient attacks. (e) Release of a challenge dataset to allow design of new attacks.

APA

Huang, Y., Song, Z., Li, K. & Arora, S.. (2020). InstaHide: Instance-hiding Schemes for Private Distributed Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4507-4518 Available from https://proceedings.mlr.press/v119/huang20i.html.

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Abstract

Cite this Paper

Related Material