Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining

Yudong Gao, Honglong Chen, Peng Sun, Zhe Li, Junjian Li, Huajie Shao
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14611-14637, 2024.

Abstract

Backdoor defense is crucial to ensure the safety and robustness of machine learning models when under attack. However, most existing methods specialize in either the detection or removal of backdoors, but seldom both. While few works have addressed both, these methods rely on strong assumptions or entail significant overhead costs, such as the need of task-specific samples for detection and model retraining for removal. Hence, the key challenge is how to reduce overhead and relax unrealistic assumptions. In this work, we propose two Energy-Based BAckdoor defense methods, called EBBA and EBBA+, that can achieve both backdoored model detection and backdoor removal with low overhead. Our contributions are twofold: First, we offer theoretical analysis for our observation that a predefined target label is more likely to occur among the top results for various samples. Inspired by this, we develop an enhanced energy-based technique, called EBBA, to detect backdoored models without task-specific samples (i.e., samples from any tasks). Secondly, we theoretically analyze that after data corruption, the original clean label of a poisoned sample is more likely to be predicted as a top output by the model, a sharp contrast to clean samples. Accordingly, we extend EBBA to develop EBBA+, a new transferred energy approach to efficiently detect poisoned images and remove backdoors without model retraining. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of our methods over baselines in both backdoor detection and removal. Notably, the proposed methods can effectively detect backdoored model and poisoned images as well as remove backdoors at the same time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gao24b, title = {Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining}, author = {Gao, Yudong and Chen, Honglong and Sun, Peng and Li, Zhe and Li, Junjian and Shao, Huajie}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {14611--14637}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gao24b/gao24b.pdf}, url = {https://proceedings.mlr.press/v235/gao24b.html}, abstract = {Backdoor defense is crucial to ensure the safety and robustness of machine learning models when under attack. However, most existing methods specialize in either the detection or removal of backdoors, but seldom both. While few works have addressed both, these methods rely on strong assumptions or entail significant overhead costs, such as the need of task-specific samples for detection and model retraining for removal. Hence, the key challenge is how to reduce overhead and relax unrealistic assumptions. In this work, we propose two Energy-Based BAckdoor defense methods, called EBBA and EBBA+, that can achieve both backdoored model detection and backdoor removal with low overhead. Our contributions are twofold: First, we offer theoretical analysis for our observation that a predefined target label is more likely to occur among the top results for various samples. Inspired by this, we develop an enhanced energy-based technique, called EBBA, to detect backdoored models without task-specific samples (i.e., samples from any tasks). Secondly, we theoretically analyze that after data corruption, the original clean label of a poisoned sample is more likely to be predicted as a top output by the model, a sharp contrast to clean samples. Accordingly, we extend EBBA to develop EBBA+, a new transferred energy approach to efficiently detect poisoned images and remove backdoors without model retraining. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of our methods over baselines in both backdoor detection and removal. Notably, the proposed methods can effectively detect backdoored model and poisoned images as well as remove backdoors at the same time.} }
Endnote
%0 Conference Paper %T Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining %A Yudong Gao %A Honglong Chen %A Peng Sun %A Zhe Li %A Junjian Li %A Huajie Shao %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gao24b %I PMLR %P 14611--14637 %U https://proceedings.mlr.press/v235/gao24b.html %V 235 %X Backdoor defense is crucial to ensure the safety and robustness of machine learning models when under attack. However, most existing methods specialize in either the detection or removal of backdoors, but seldom both. While few works have addressed both, these methods rely on strong assumptions or entail significant overhead costs, such as the need of task-specific samples for detection and model retraining for removal. Hence, the key challenge is how to reduce overhead and relax unrealistic assumptions. In this work, we propose two Energy-Based BAckdoor defense methods, called EBBA and EBBA+, that can achieve both backdoored model detection and backdoor removal with low overhead. Our contributions are twofold: First, we offer theoretical analysis for our observation that a predefined target label is more likely to occur among the top results for various samples. Inspired by this, we develop an enhanced energy-based technique, called EBBA, to detect backdoored models without task-specific samples (i.e., samples from any tasks). Secondly, we theoretically analyze that after data corruption, the original clean label of a poisoned sample is more likely to be predicted as a top output by the model, a sharp contrast to clean samples. Accordingly, we extend EBBA to develop EBBA+, a new transferred energy approach to efficiently detect poisoned images and remove backdoors without model retraining. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of our methods over baselines in both backdoor detection and removal. Notably, the proposed methods can effectively detect backdoored model and poisoned images as well as remove backdoors at the same time.
APA
Gao, Y., Chen, H., Sun, P., Li, Z., Li, J. & Shao, H.. (2024). Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14611-14637 Available from https://proceedings.mlr.press/v235/gao24b.html.

Related Material