Detecting textual adversarial examples through randomized substitution and vote

Xiaosen Wang, Xiong Yifeng, Kun He
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:2056-2065, 2022.

Abstract

A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, \textit{e.g.} adversarial training, input transformations, detection, \textit{etc}. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed \textit{Randomized Substitution and Vote} (RS&V), which votes the prediction label by accumulating the logits of $k$ samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-wang22b, title = {Detecting textual adversarial examples through randomized substitution and vote}, author = {Wang, Xiaosen and Yifeng, Xiong and He, Kun}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {2056--2065}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/wang22b/wang22b.pdf}, url = {https://proceedings.mlr.press/v180/wang22b.html}, abstract = {A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, \textit{e.g.} adversarial training, input transformations, detection, \textit{etc}. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed \textit{Randomized Substitution and Vote} (RS&V), which votes the prediction label by accumulating the logits of $k$ samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.} }
Endnote
%0 Conference Paper %T Detecting textual adversarial examples through randomized substitution and vote %A Xiaosen Wang %A Xiong Yifeng %A Kun He %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-wang22b %I PMLR %P 2056--2065 %U https://proceedings.mlr.press/v180/wang22b.html %V 180 %X A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, \textit{e.g.} adversarial training, input transformations, detection, \textit{etc}. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed \textit{Randomized Substitution and Vote} (RS&V), which votes the prediction label by accumulating the logits of $k$ samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.
APA
Wang, X., Yifeng, X. & He, K.. (2022). Detecting textual adversarial examples through randomized substitution and vote. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:2056-2065 Available from https://proceedings.mlr.press/v180/wang22b.html.

Related Material