On Global-view Based Defense via Adversarial Attack and Defense Risk Guaranteed Bounds
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:11438-11460, 2022.
It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, which presents the most severe fragility of the deep learning system. Despite achieving impressive performance, most of the current state-of-the-art classifiers remain highly vulnerable to carefully crafted imperceptible, adversarial perturbations. Recent research attempts to understand neural network attack and defense have become increasingly urgent and important. While rapid progress has been made on this front, there is still an important theoretical gap in achieving guaranteed bounds on attack/defense models, leaving uncertainty in the quality and certified guarantees of these models. To this end, we systematically address this problem in this paper. More specifically, we formulate attack and defense in a generic setting where there exists a family of adversaries (i.e., attackers) for attacking a family of classifiers (i.e., defenders). We develop a novel class of f-divergences suitable for measuring divergence among multiple distributions. This equips us to study the interactions between attackers and defenders in a countervailing game where we formulate a joint risk on attack and defense schemes. This is followed by our key results on guaranteed upper and lower bounds on this risk that can provide a better understanding of the behaviors of those parties from the attack and defense perspectives, thereby having important implications to both attack and defense sides. Finally, benefited from our theory, we propose an empirical approach that bases on a global view to defend against adversarial attacks. The experimental results conducted on benchmark datasets show that the global view for attack/defense if exploited appropriately can help to improve adversarial robustness.