TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Yichuan Mo; Hui Huang; Mingjie Li; Ang Li; Yisen Wang

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35892-35909, 2024.

Abstract

Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-mo24a,
  title = 	 {{TERD}: A Unified Framework for Safeguarding Diffusion Models Against Backdoors},
  author =       {Mo, Yichuan and Huang, Hui and Li, Mingjie and Li, Ang and Wang, Yisen},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {35892--35909},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/mo24a/mo24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/mo24a.html},
  abstract = 	 {Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.}
}

Endnote

%0 Conference Paper
%T TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
%A Yichuan Mo
%A Hui Huang
%A Mingjie Li
%A Ang Li
%A Yisen Wang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-mo24a
%I PMLR
%P 35892--35909
%U https://proceedings.mlr.press/v235/mo24a.html
%V 235
%X Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.

APA


Mo, Y., Huang, H., Li, M., Li, A. & Wang, Y.. (2024). TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:35892-35909 Available from https://proceedings.mlr.press/v235/mo24a.html.

TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

Abstract

Cite this Paper

Related Material