Certified Adversarial Robustness via Randomized Smoothing
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1310-1320, 2019.
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the L2 norm. While this "randomized smoothing" technique has been proposed before in the literature, we are the first to provide a tight analysis, which establishes a close connection between L2 robustness and Gaussian noise. We use the technique to train an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with L2 norm less than 0.5 (=127/255). Smoothing is the only approach to certifiably robust classification which has been shown feasible on full-resolution ImageNet. On smaller-scale datasets where competing approaches to certified L2 robustness are viable, smoothing delivers higher certified accuracies. The empirical success of the approach suggests that provable methods based on randomization at prediction time are a promising direction for future research into adversarially robust classification.