[edit]
Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39853-39870, 2025.
Abstract
Adversarial examples have been shown to deceive Deep Neural Networks (DNNs), raising widespread concerns about this security threat. More seriously, as different DNN models share critical features, feature-level attacks can generate transferable adversarial examples, thereby deceiving black-box models in real-world scenarios. Nevertheless, we have theoretically discovered the principle behind the limited transferability of existing feature-level attacks: Their attack effectiveness is essentially equivalent to perturbing features in one step along the direction of feature importance in the feature space, despite performing multiple perturbations in the pixel space. This finding indicates that existing feature-level attacks are inefficient in disrupting features through multiple pixel-space perturbations. To address this problem, we propose a P2FA that efficiently perturbs features multiple times. Specifically, we directly shift the perturbed space from pixel to feature space. Then, we perturb the features multiple times rather than just once in the feature space with the guidance of feature importance to enhance the efficiency of disrupting critical shared features. Finally, we invert the perturbed features to the pixels to generate more transferable adversarial examples. Numerous experimental results strongly demonstrate the superior transferability of P2FA over State-Of-The-Art (SOTA) attacks.