Defending against Whitebox Adversarial Attacks via Randomized Discretization

Yuchen Zhang; Percy Liang

Defending against Whitebox Adversarial Attacks via Randomized Discretization

Yuchen Zhang, Percy Liang

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:684-693, 2019.

Abstract

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $L_{\infty}$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.

Cite this Paper

BibTeX

@InProceedings{pmlr-v89-zhang19b,
  title = 	 {Defending against Whitebox Adversarial Attacks via Randomized Discretization},
  author =       {Zhang, Yuchen and Liang, Percy},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {684--693},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/zhang19b/zhang19b.pdf},
  url = 	 {https://proceedings.mlr.press/v89/zhang19b.html},
  abstract = 	 {Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $L_{\infty}$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.}
}

Endnote

%0 Conference Paper
%T Defending against Whitebox Adversarial Attacks via Randomized Discretization
%A Yuchen Zhang
%A Percy Liang
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-zhang19b
%I PMLR
%P 684--693
%U https://proceedings.mlr.press/v89/zhang19b.html
%V 89
%X Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $L_{\infty}$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.

APA

Zhang, Y. & Liang, P.. (2019). Defending against Whitebox Adversarial Attacks via Randomized Discretization. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:684-693 Available from https://proceedings.mlr.press/v89/zhang19b.html.

Related Material

Download PDF