[edit]
Origins of Low-Dimensional Adversarial Perturbations
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9221-9237, 2023.
Abstract
Machine learning models are known to be susceptible to adversarial perturbations. Even more concerning is the fact that these adversarial perturbations can be found by black-box search using surprisingly few queries, which essentially restricts the perturbation to a subspace of dimension $k$—much smaller than the dimension $d$ of the image space. This intriguing phenomenon raises the question: Is the vulnerability to black-box attacks inherent or can we hope to prevent them? In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs). Our result characterizes precisely the sufficient conditions for the existence of LDAPs, and we show that these conditions hold for neural networks under practical settings, including the so-called lazy regime wherein the parameters of the trained network remain close to their values at initialization. Our theoretical results are confirmed by experiments on both synthetic and real data.