[edit]
Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:837-850, 2020.
Abstract
Currently it is well known that deep neural networks are vulnerable to adversarial examples, constructed by applying small but malicious perturbations to the original inputs. Moreover, the perturbed inputs can transfer between different models: adversarial examples generated based on a specific model will often fool other unseen models with a significant success rate. This allows the adversary to leverage it to attack the deployed systems without any query, which could raise severe security issue particularly in safety-critical scenarios. In this work, we empirically investigate two classes of factors that might influence the transferability of adversarial examples. One is about model-specific factors, including network architecture, model capacity and test accuracy. The other is the local smoothness of loss surface for generating adversarial examples. More importantly, relying on these findings on the transferability of adversarial examples, we propose a simple but effective strategy to improve the transferability, whose effectiveness is confirmed through extensive experiments on both CIFAR-10 and ImageNet datasets.