CleanBattack: A Clean-Label Text Backdoor Attack with Limited Information

Huahui Li, Xi Xiong, Yan Yv, Zhongzhi Li
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:341-349, 2025.

Abstract

As a new security threat against deep neural networks (DNNS), backdoor attacks have been widely studied in the field of Natural Language Processing (NLP). By providing poisoned training data, the attacker injects hidden backdoors into the victim model, which causes the victim model to behave normally on normal inputs but produce attatter-specified malicious outputs on poisoned inputs embedded with special triggers. Backdoor attacks that inject data that appears to be labeled correctly to bypass human inspection are often referred to as clean label attacks." However, the existing clean label attacks have some limitations, such as requiring a high proportion of poisoned samples, relying on explicit triggers, or difficult to obtain complete training data. In this paper, we propose CleanBattack, a clean label backdoor attack that only requires knowledge of the target category of training data, designs precise vectors as triggers, and combines synonym replacement to achieve attack injection. The experimental results show that the attack success rate of CleanBattack is 6.3%and 15.9%higher than that of the baseline, and the clean accuracy rate is 0.8%and 0.9%higher than that of the baseline, which proves that the method has significant advantages in concealment and effectiveness, expands the application scope of clean label attack, and makes existing defense methods have failure risk.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-li25g, title = {CleanBattack: A Clean-Label Text Backdoor Attack with Limited Information}, author = {Li, Huahui and Xiong, Xi and Yv, Yan and Li, Zhongzhi}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {341--349}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/li25g/li25g.pdf}, url = {https://proceedings.mlr.press/v278/li25g.html}, abstract = {As a new security threat against deep neural networks (DNNS), backdoor attacks have been widely studied in the field of Natural Language Processing (NLP). By providing poisoned training data, the attacker injects hidden backdoors into the victim model, which causes the victim model to behave normally on normal inputs but produce attatter-specified malicious outputs on poisoned inputs embedded with special triggers. Backdoor attacks that inject data that appears to be labeled correctly to bypass human inspection are often referred to as clean label attacks." However, the existing clean label attacks have some limitations, such as requiring a high proportion of poisoned samples, relying on explicit triggers, or difficult to obtain complete training data. In this paper, we propose CleanBattack, a clean label backdoor attack that only requires knowledge of the target category of training data, designs precise vectors as triggers, and combines synonym replacement to achieve attack injection. The experimental results show that the attack success rate of CleanBattack is 6.3%and 15.9%higher than that of the baseline, and the clean accuracy rate is 0.8%and 0.9%higher than that of the baseline, which proves that the method has significant advantages in concealment and effectiveness, expands the application scope of clean label attack, and makes existing defense methods have failure risk.} }
Endnote
%0 Conference Paper %T CleanBattack: A Clean-Label Text Backdoor Attack with Limited Information %A Huahui Li %A Xi Xiong %A Yan Yv %A Zhongzhi Li %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-li25g %I PMLR %P 341--349 %U https://proceedings.mlr.press/v278/li25g.html %V 278 %X As a new security threat against deep neural networks (DNNS), backdoor attacks have been widely studied in the field of Natural Language Processing (NLP). By providing poisoned training data, the attacker injects hidden backdoors into the victim model, which causes the victim model to behave normally on normal inputs but produce attatter-specified malicious outputs on poisoned inputs embedded with special triggers. Backdoor attacks that inject data that appears to be labeled correctly to bypass human inspection are often referred to as clean label attacks." However, the existing clean label attacks have some limitations, such as requiring a high proportion of poisoned samples, relying on explicit triggers, or difficult to obtain complete training data. In this paper, we propose CleanBattack, a clean label backdoor attack that only requires knowledge of the target category of training data, designs precise vectors as triggers, and combines synonym replacement to achieve attack injection. The experimental results show that the attack success rate of CleanBattack is 6.3%and 15.9%higher than that of the baseline, and the clean accuracy rate is 0.8%and 0.9%higher than that of the baseline, which proves that the method has significant advantages in concealment and effectiveness, expands the application scope of clean label attack, and makes existing defense methods have failure risk.
APA
Li, H., Xiong, X., Yv, Y. & Li, Z.. (2025). CleanBattack: A Clean-Label Text Backdoor Attack with Limited Information. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:341-349 Available from https://proceedings.mlr.press/v278/li25g.html.

Related Material