Radioactive data: tracing through training

Alexandre Sablayrolles; Matthijs Douze; Cordelia Schmid; Herve Jegou

Radioactive data: tracing through training

Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Herve Jegou

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8326-8335, 2020.

Abstract

Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-sablayrolles20a,
  title = 	 {Radioactive data: tracing through training},
  author =       {Sablayrolles, Alexandre and Douze, Matthijs and Schmid, Cordelia and Jegou, Herve},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {8326--8335},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/sablayrolles20a/sablayrolles20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/sablayrolles20a.html},
  abstract = 	 {Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.}
}

Endnote

%0 Conference Paper
%T Radioactive data: tracing through training
%A Alexandre Sablayrolles
%A Matthijs Douze
%A Cordelia Schmid
%A Herve Jegou
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-sablayrolles20a
%I PMLR
%P 8326--8335
%U https://proceedings.mlr.press/v119/sablayrolles20a.html
%V 119
%X Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.

APA


Sablayrolles, A., Douze, M., Schmid, C. & Jegou, H.. (2020). Radioactive data: tracing through training. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8326-8335 Available from https://proceedings.mlr.press/v119/sablayrolles20a.html.

Radioactive data: tracing through training

Abstract

Cite this Paper

Related Material