Improving Adversarial Robustness via Mutual Information Estimation

Dawei Zhou, Nannan Wang, Xinbo Gao, Bo Han, Xiaoyu Wang, Yibing Zhan, Tongliang Liu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27338-27352, 2022.

Abstract

Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhou22j, title = {Improving Adversarial Robustness via Mutual Information Estimation}, author = {Zhou, Dawei and Wang, Nannan and Gao, Xinbo and Han, Bo and Wang, Xiaoyu and Zhan, Yibing and Liu, Tongliang}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27338--27352}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhou22j/zhou22j.pdf}, url = {https://proceedings.mlr.press/v162/zhou22j.html}, abstract = {Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.} }
Endnote
%0 Conference Paper %T Improving Adversarial Robustness via Mutual Information Estimation %A Dawei Zhou %A Nannan Wang %A Xinbo Gao %A Bo Han %A Xiaoyu Wang %A Yibing Zhan %A Tongliang Liu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhou22j %I PMLR %P 27338--27352 %U https://proceedings.mlr.press/v162/zhou22j.html %V 162 %X Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.
APA
Zhou, D., Wang, N., Gao, X., Han, B., Wang, X., Zhan, Y. & Liu, T.. (2022). Improving Adversarial Robustness via Mutual Information Estimation. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27338-27352 Available from https://proceedings.mlr.press/v162/zhou22j.html.

Related Material