Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees

Darshan Thaker, Paris Giampouras, Rene Vidal
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21253-21271, 2022.

Abstract

Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as $\ell_p$-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-thaker22a, title = {Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees}, author = {Thaker, Darshan and Giampouras, Paris and Vidal, Rene}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {21253--21271}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/thaker22a/thaker22a.pdf}, url = {https://proceedings.mlr.press/v162/thaker22a.html}, abstract = {Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as $\ell_p$-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.} }
Endnote
%0 Conference Paper %T Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees %A Darshan Thaker %A Paris Giampouras %A Rene Vidal %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-thaker22a %I PMLR %P 21253--21271 %U https://proceedings.mlr.press/v162/thaker22a.html %V 162 %X Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as $\ell_p$-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.
APA
Thaker, D., Giampouras, P. & Vidal, R.. (2022). Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:21253-21271 Available from https://proceedings.mlr.press/v162/thaker22a.html.

Related Material