[edit]
Variance as a Catalyst: Efficient and Transferable Semantic Erasure Adversarial Attack for Customized Diffusion Models
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:71275-71301, 2025.
Abstract
Latent Diffusion Models (LDMs) enable fine-tuning with only a few images and have become widely used on the Internet. However, it can also be misused to generate fake images, leading to privacy violations and social risks. Existing adversarial attack methods primarily introduce noise distortions to generated images but fail to completely erase identity semantics. In this work, we identify the variance of VAE latent code as a key factor that influences image distortion. Specifically, larger variances result in stronger distortions and ultimately erase semantic information. Based on this finding, we propose a Laplace-based (LA) loss function that optimizes along the fastest variance growth direction, ensuring each optimization step is locally optimal. Additionally, we analyze the limitations of existing methods and reveal that their loss functions often fail to align gradient signs with the direction of variance growth. They also struggle to ensure efficient optimization under different variance distributions. To address these issues, we further propose a novel Lagrange Entropy-based (LE) loss function. Experimental results demonstrate that our methods achieve state-of-the-art performance on CelebA-HQ and VGGFace2. Both proposed loss functions effectively lead diffusion models to generate pure-noise images with identity semantics completely erased. Furthermore, our methods exhibit strong transferability across diverse models and efficiently complete attacks with minimal computational resources. Our work provides a practical and efficient solution for privacy protection.