Radioactive data: tracing through training

Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Herve Jegou
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8326-8335, 2020.

Abstract

Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-sablayrolles20a, title = {Radioactive data: tracing through training}, author = {Sablayrolles, Alexandre and Douze, Matthijs and Schmid, Cordelia and Jegou, Herve}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8326--8335}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/sablayrolles20a/sablayrolles20a.pdf}, url = {https://proceedings.mlr.press/v119/sablayrolles20a.html}, abstract = {Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.} }
Endnote
%0 Conference Paper %T Radioactive data: tracing through training %A Alexandre Sablayrolles %A Matthijs Douze %A Cordelia Schmid %A Herve Jegou %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-sablayrolles20a %I PMLR %P 8326--8335 %U https://proceedings.mlr.press/v119/sablayrolles20a.html %V 119 %X Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0.0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.
APA
Sablayrolles, A., Douze, M., Schmid, C. & Jegou, H.. (2020). Radioactive data: tracing through training. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8326-8335 Available from https://proceedings.mlr.press/v119/sablayrolles20a.html.

Related Material