Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

Florian Tramer
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21692-21702, 2022.

Abstract

Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, but preserves the data complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their computational complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-tramer22a, title = {Detecting Adversarial Examples Is ({N}early) As Hard As Classifying Them}, author = {Tramer, Florian}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {21692--21702}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/tramer22a/tramer22a.pdf}, url = {https://proceedings.mlr.press/v162/tramer22a.html}, abstract = {Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, but preserves the data complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their computational complexity.} }
Endnote
%0 Conference Paper %T Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them %A Florian Tramer %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-tramer22a %I PMLR %P 21692--21702 %U https://proceedings.mlr.press/v162/tramer22a.html %V 162 %X Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, but preserves the data complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their computational complexity.
APA
Tramer, F.. (2022). Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:21692-21702 Available from https://proceedings.mlr.press/v162/tramer22a.html.

Related Material