Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Chen Zhu, W. Ronny Huang, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7614-7623, 2019.

Abstract

In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-zhu19a, title = {Transferable Clean-Label Poisoning Attacks on Deep Neural Nets}, author = {Zhu, Chen and Huang, W. Ronny and Li, Hengduo and Taylor, Gavin and Studer, Christoph and Goldstein, Tom}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7614--7623}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/zhu19a/zhu19a.pdf}, url = {https://proceedings.mlr.press/v97/zhu19a.html}, abstract = {In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.} }
Endnote
%0 Conference Paper %T Transferable Clean-Label Poisoning Attacks on Deep Neural Nets %A Chen Zhu %A W. Ronny Huang %A Hengduo Li %A Gavin Taylor %A Christoph Studer %A Tom Goldstein %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-zhu19a %I PMLR %P 7614--7623 %U https://proceedings.mlr.press/v97/zhu19a.html %V 97 %X In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.
APA
Zhu, C., Huang, W.R., Li, H., Taylor, G., Studer, C. & Goldstein, T.. (2019). Transferable Clean-Label Poisoning Attacks on Deep Neural Nets. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7614-7623 Available from https://proceedings.mlr.press/v97/zhu19a.html.

Related Material