[edit]
Can Adversarially Robust Learning LeverageComputational Hardness?
Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:581-609, 2019.
Abstract
Making learners robust to adversarial perturbation at test time (i.e., evasion attacks finding adversarial examples) or training time (i.e., data poisoning attacks) has emerged as a challenging task. It is known that in some cases \emph{sublinear} perturbations in the training phase or the testing phase can drastically decrease the quality of the predictions. These negative results, however, only prove the \emph{existence} of such successful adversarial perturbations. A natural question for these settings is whether or not we can make classifiers \emph{computationally} robust to \emph{polynomial-time} attacks.
In this work, we prove some barriers against achieving such envisioned computational robustness for evasion attacks (for specific metric probability spaces) as well as poisoning attacks. In particular, we show that if the test instances come from a product distribution (e.g., uniform over $\{0,1\}^n$ or $[0,1]^n$, or isotropic $n$-variate Gaussian) and that there is an initial constant error, then there exists a \emph{polynomial-time} attack that finds adversarial examples of Hamming distance $O(\sqrt n)$.
For poisoning attacks, we prove that for any deterministic learning algorithm with sample complexity $m$ and any efficiently computable “predicate” defining some “bad” property $B$ for the produced hypothesis (e.g., failing on a particular test) that happens with an initial constant probability, there exist a \emph{polynomial-time} online poisoning attack that replaces $O (\sqrt m)$ of the training examples with other correctly labeled examples and increases the probability of the bad event $B$ to $\approx 1$.
Both of our poisoning and evasion attacks are \emph{black-box} in how they access their corresponding components of the system (i.e., the hypothesis, the concept, and the learning algorithm).