[edit]
Detect Adversarial Examples with Exchangeability Martingale
Proceedings of the Fourteenth Symposium on Conformal and Probabilistic Prediction with Applications, PMLR 266:758-761, 2025.
Abstract
Adversarial examples (AEs) are raw examples perturbed in a way that is indistinguishable by humans, misleading DNNs into an incorrect prediction. When present in a sequence of examples, AEs disrupt the assumption of exchangeability that examples are drawn i.i.d. from a fixed time-invariant distribution. In this paper, we propose an efficient method for AEs detection in image sequences based on conformal test martingales constructed from example embeddings. To improve the sensitivity of AEs detection, we further augment embeddings with gradient-based attention and local intrinsic dimension (LID) modulation. Our study demonstrates the high efficiency of detecting AEs generated by FGSM, PGD, and CW methods under different hyperparameter settings.