[edit]

# Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

*Proceedings of the 39th International Conference on Machine Learning*, PMLR 162:21692-21702, 2022.

#### Abstract

Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of

*detecting*perturbed inputs. We show a barrier towards this goal. We prove a*hardness reduction*between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient)*classifier*for attacks at distance $\epsilon/2$. Our reduction is*computationally*inefficient, but preserves the*data complexity*of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient*classifier*). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their*computational*complexity.