Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Francesco Croce, Matthias Hein
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2196-2205, 2020.

Abstract

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient obfuscation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-croce20a, title = {Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack}, author = {Croce, Francesco and Hein, Matthias}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2196--2205}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/croce20a/croce20a.pdf}, url = {https://proceedings.mlr.press/v119/croce20a.html}, abstract = {The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient obfuscation.} }
Endnote
%0 Conference Paper %T Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack %A Francesco Croce %A Matthias Hein %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-croce20a %I PMLR %P 2196--2205 %U https://proceedings.mlr.press/v119/croce20a.html %V 119 %X The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient obfuscation.
APA
Croce, F. & Hein, M.. (2020). Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2196-2205 Available from https://proceedings.mlr.press/v119/croce20a.html.

Related Material