[edit]
CleanBattack: A Clean-Label Text Backdoor Attack with Limited Information
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:341-349, 2025.
Abstract
As a new security threat against deep neural networks (DNNS), backdoor attacks have been widely studied in the field of Natural Language Processing (NLP). By providing poisoned training data, the attacker injects hidden backdoors into the victim model, which causes the victim model to behave normally on normal inputs but produce attatter-specified malicious outputs on poisoned inputs embedded with special triggers. Backdoor attacks that inject data that appears to be labeled correctly to bypass human inspection are often referred to as clean label attacks." However, the existing clean label attacks have some limitations, such as requiring a high proportion of poisoned samples, relying on explicit triggers, or difficult to obtain complete training data. In this paper, we propose CleanBattack, a clean label backdoor attack that only requires knowledge of the target category of training data, designs precise vectors as triggers, and combines synonym replacement to achieve attack injection. The experimental results show that the attack success rate of CleanBattack is 6.3%and 15.9%higher than that of the baseline, and the clean accuracy rate is 0.8%and 0.9%higher than that of the baseline, which proves that the method has significant advantages in concealment and effectiveness, expands the application scope of clean label attack, and makes existing defense methods have failure risk.