Towards Defending against Adversarial Examples via Attack-Invariant Features

Dawei Zhou, Tongliang Liu, Bo Han, Nannan Wang, Chunlei Peng, Xinbo Gao
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12835-12845, 2021.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zhou21e, title = {Towards Defending against Adversarial Examples via Attack-Invariant Features}, author = {Zhou, Dawei and Liu, Tongliang and Han, Bo and Wang, Nannan and Peng, Chunlei and Gao, Xinbo}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12835--12845}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zhou21e/zhou21e.pdf}, url = {https://proceedings.mlr.press/v139/zhou21e.html}, abstract = {Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.} }
Endnote
%0 Conference Paper %T Towards Defending against Adversarial Examples via Attack-Invariant Features %A Dawei Zhou %A Tongliang Liu %A Bo Han %A Nannan Wang %A Chunlei Peng %A Xinbo Gao %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zhou21e %I PMLR %P 12835--12845 %U https://proceedings.mlr.press/v139/zhou21e.html %V 139 %X Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.
APA
Zhou, D., Liu, T., Han, B., Wang, N., Peng, C. & Gao, X.. (2021). Towards Defending against Adversarial Examples via Attack-Invariant Features. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12835-12845 Available from https://proceedings.mlr.press/v139/zhou21e.html.

Related Material