Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time

Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, Jing Dong
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:21932-21947, 2025.

Abstract

Text-to-image (T2I) diffusion models have raised concerns about generating inappropriate content, such as "nudity". Despite efforts to erase undesirable concepts through unlearning techniques, these unlearned models remain vulnerable to adversarial inputs that can potentially regenerate such content. To safeguard unlearned models, we propose a novel inference-time defense strategy that mitigates the impact of adversarial inputs. Specifically, we first reformulate the challenge of ensuring robustness in unlearned diffusion models as a robust regression problem. Building upon the naive median smoothing for regression robustness, which employs isotropic Gaussian noise, we develop a generalized median smoothing framework that incorporates anisotropic noise. Based on this framework, we introduce a token-wise Adaptive Median Smoothing method that dynamically adjusts noise intensity according to each token’s relevance to target concepts. Furthermore, to improve inference efficiency, we explore implementations of this adaptive method at the text-encoding stage. Extensive experiments demonstrate that our approach enhances adversarial robustness while preserving model utility and inference efficiency, outperforming baseline defense techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-han25k, title = {Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time}, author = {Han, Xiaoxuan and Yang, Songlin and Wang, Wei and Li, Yang and Dong, Jing}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {21932--21947}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/han25k/han25k.pdf}, url = {https://proceedings.mlr.press/v267/han25k.html}, abstract = {Text-to-image (T2I) diffusion models have raised concerns about generating inappropriate content, such as "nudity". Despite efforts to erase undesirable concepts through unlearning techniques, these unlearned models remain vulnerable to adversarial inputs that can potentially regenerate such content. To safeguard unlearned models, we propose a novel inference-time defense strategy that mitigates the impact of adversarial inputs. Specifically, we first reformulate the challenge of ensuring robustness in unlearned diffusion models as a robust regression problem. Building upon the naive median smoothing for regression robustness, which employs isotropic Gaussian noise, we develop a generalized median smoothing framework that incorporates anisotropic noise. Based on this framework, we introduce a token-wise Adaptive Median Smoothing method that dynamically adjusts noise intensity according to each token’s relevance to target concepts. Furthermore, to improve inference efficiency, we explore implementations of this adaptive method at the text-encoding stage. Extensive experiments demonstrate that our approach enhances adversarial robustness while preserving model utility and inference efficiency, outperforming baseline defense techniques.} }
Endnote
%0 Conference Paper %T Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time %A Xiaoxuan Han %A Songlin Yang %A Wei Wang %A Yang Li %A Jing Dong %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-han25k %I PMLR %P 21932--21947 %U https://proceedings.mlr.press/v267/han25k.html %V 267 %X Text-to-image (T2I) diffusion models have raised concerns about generating inappropriate content, such as "nudity". Despite efforts to erase undesirable concepts through unlearning techniques, these unlearned models remain vulnerable to adversarial inputs that can potentially regenerate such content. To safeguard unlearned models, we propose a novel inference-time defense strategy that mitigates the impact of adversarial inputs. Specifically, we first reformulate the challenge of ensuring robustness in unlearned diffusion models as a robust regression problem. Building upon the naive median smoothing for regression robustness, which employs isotropic Gaussian noise, we develop a generalized median smoothing framework that incorporates anisotropic noise. Based on this framework, we introduce a token-wise Adaptive Median Smoothing method that dynamically adjusts noise intensity according to each token’s relevance to target concepts. Furthermore, to improve inference efficiency, we explore implementations of this adaptive method at the text-encoding stage. Extensive experiments demonstrate that our approach enhances adversarial robustness while preserving model utility and inference efficiency, outperforming baseline defense techniques.
APA
Han, X., Yang, S., Wang, W., Li, Y. & Dong, J.. (2025). Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:21932-21947 Available from https://proceedings.mlr.press/v267/han25k.html.

Related Material