Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks

Lei Wu, Zhanxing Zhu
Proceedings of The 12th Asian Conference on Machine Learning, PMLR 129:837-850, 2020.

Abstract

Currently it is well known that deep neural networks are vulnerable to adversarial examples, constructed by applying small but malicious perturbations to the original inputs. Moreover, the perturbed inputs can transfer between different models: adversarial examples generated based on a specific model will often fool other unseen models with a significant success rate. This allows the adversary to leverage it to attack the deployed systems without any query, which could raise severe security issue particularly in safety-critical scenarios. In this work, we empirically investigate two classes of factors that might influence the transferability of adversarial examples. One is about model-specific factors, including network architecture, model capacity and test accuracy. The other is the local smoothness of loss surface for generating adversarial examples. More importantly, relying on these findings on the transferability of adversarial examples, we propose a simple but effective strategy to improve the transferability, whose effectiveness is confirmed through extensive experiments on both CIFAR-10 and ImageNet datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v129-wu20a, title = {Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks}, author = {Wu, Lei and Zhu, Zhanxing}, booktitle = {Proceedings of The 12th Asian Conference on Machine Learning}, pages = {837--850}, year = {2020}, editor = {Pan, Sinno Jialin and Sugiyama, Masashi}, volume = {129}, series = {Proceedings of Machine Learning Research}, month = {18--20 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v129/wu20a/wu20a.pdf}, url = {https://proceedings.mlr.press/v129/wu20a.html}, abstract = {Currently it is well known that deep neural networks are vulnerable to adversarial examples, constructed by applying small but malicious perturbations to the original inputs. Moreover, the perturbed inputs can transfer between different models: adversarial examples generated based on a specific model will often fool other unseen models with a significant success rate. This allows the adversary to leverage it to attack the deployed systems without any query, which could raise severe security issue particularly in safety-critical scenarios. In this work, we empirically investigate two classes of factors that might influence the transferability of adversarial examples. One is about model-specific factors, including network architecture, model capacity and test accuracy. The other is the local smoothness of loss surface for generating adversarial examples. More importantly, relying on these findings on the transferability of adversarial examples, we propose a simple but effective strategy to improve the transferability, whose effectiveness is confirmed through extensive experiments on both CIFAR-10 and ImageNet datasets.} }
Endnote
%0 Conference Paper %T Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks %A Lei Wu %A Zhanxing Zhu %B Proceedings of The 12th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Sinno Jialin Pan %E Masashi Sugiyama %F pmlr-v129-wu20a %I PMLR %P 837--850 %U https://proceedings.mlr.press/v129/wu20a.html %V 129 %X Currently it is well known that deep neural networks are vulnerable to adversarial examples, constructed by applying small but malicious perturbations to the original inputs. Moreover, the perturbed inputs can transfer between different models: adversarial examples generated based on a specific model will often fool other unseen models with a significant success rate. This allows the adversary to leverage it to attack the deployed systems without any query, which could raise severe security issue particularly in safety-critical scenarios. In this work, we empirically investigate two classes of factors that might influence the transferability of adversarial examples. One is about model-specific factors, including network architecture, model capacity and test accuracy. The other is the local smoothness of loss surface for generating adversarial examples. More importantly, relying on these findings on the transferability of adversarial examples, we propose a simple but effective strategy to improve the transferability, whose effectiveness is confirmed through extensive experiments on both CIFAR-10 and ImageNet datasets.
APA
Wu, L. & Zhu, Z.. (2020). Towards Understanding and Improving the Transferability of Adversarial Examples in Deep Neural Networks. Proceedings of The 12th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 129:837-850 Available from https://proceedings.mlr.press/v129/wu20a.html.

Related Material