Origins of Low-Dimensional Adversarial Perturbations

Elvis Dohmatob, Chuan Guo, Morgane Goibert
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9221-9237, 2023.

Abstract

Machine learning models are known to be susceptible to adversarial perturbations. Even more concerning is the fact that these adversarial perturbations can be found by black-box search using surprisingly few queries, which essentially restricts the perturbation to a subspace of dimension $k$—much smaller than the dimension $d$ of the image space. This intriguing phenomenon raises the question: Is the vulnerability to black-box attacks inherent or can we hope to prevent them? In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs). Our result characterizes precisely the sufficient conditions for the existence of LDAPs, and we show that these conditions hold for neural networks under practical settings, including the so-called lazy regime wherein the parameters of the trained network remain close to their values at initialization. Our theoretical results are confirmed by experiments on both synthetic and real data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-dohmatob23a, title = {Origins of Low-Dimensional Adversarial Perturbations}, author = {Dohmatob, Elvis and Guo, Chuan and Goibert, Morgane}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {9221--9237}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/dohmatob23a/dohmatob23a.pdf}, url = {https://proceedings.mlr.press/v206/dohmatob23a.html}, abstract = {Machine learning models are known to be susceptible to adversarial perturbations. Even more concerning is the fact that these adversarial perturbations can be found by black-box search using surprisingly few queries, which essentially restricts the perturbation to a subspace of dimension $k$—much smaller than the dimension $d$ of the image space. This intriguing phenomenon raises the question: Is the vulnerability to black-box attacks inherent or can we hope to prevent them? In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs). Our result characterizes precisely the sufficient conditions for the existence of LDAPs, and we show that these conditions hold for neural networks under practical settings, including the so-called lazy regime wherein the parameters of the trained network remain close to their values at initialization. Our theoretical results are confirmed by experiments on both synthetic and real data.} }
Endnote
%0 Conference Paper %T Origins of Low-Dimensional Adversarial Perturbations %A Elvis Dohmatob %A Chuan Guo %A Morgane Goibert %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-dohmatob23a %I PMLR %P 9221--9237 %U https://proceedings.mlr.press/v206/dohmatob23a.html %V 206 %X Machine learning models are known to be susceptible to adversarial perturbations. Even more concerning is the fact that these adversarial perturbations can be found by black-box search using surprisingly few queries, which essentially restricts the perturbation to a subspace of dimension $k$—much smaller than the dimension $d$ of the image space. This intriguing phenomenon raises the question: Is the vulnerability to black-box attacks inherent or can we hope to prevent them? In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs). Our result characterizes precisely the sufficient conditions for the existence of LDAPs, and we show that these conditions hold for neural networks under practical settings, including the so-called lazy regime wherein the parameters of the trained network remain close to their values at initialization. Our theoretical results are confirmed by experiments on both synthetic and real data.
APA
Dohmatob, E., Guo, C. & Goibert, M.. (2023). Origins of Low-Dimensional Adversarial Perturbations. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:9221-9237 Available from https://proceedings.mlr.press/v206/dohmatob23a.html.

Related Material