Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7614-7623, 2019.
In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.