Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

Huishuai Zhang, Da Yu, Yiping Lu, Di He
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2792-2804, 2023.

Abstract

Adversarial example, which is usually generated by adding imperceptible adversarial noise to a clean sample, is ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-zhang23d, title = {Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks}, author = {Zhang, Huishuai and Yu, Da and Lu, Yiping and He, Di}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {2792--2804}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/zhang23d/zhang23d.pdf}, url = {https://proceedings.mlr.press/v206/zhang23d.html}, abstract = {Adversarial example, which is usually generated by adding imperceptible adversarial noise to a clean sample, is ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.} }
Endnote
%0 Conference Paper %T Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks %A Huishuai Zhang %A Da Yu %A Yiping Lu %A Di He %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-zhang23d %I PMLR %P 2792--2804 %U https://proceedings.mlr.press/v206/zhang23d.html %V 206 %X Adversarial example, which is usually generated by adding imperceptible adversarial noise to a clean sample, is ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.
APA
Zhang, H., Yu, D., Lu, Y. & He, D.. (2023). Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2792-2804 Available from https://proceedings.mlr.press/v206/zhang23d.html.

Related Material