Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation

Boheng Li, Yishuo Cai, Jisong Cai, Yiming Li, Han Qiu, Run Wang, Tianwei Zhang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:27439-27456, 2024.

Abstract

Model quantization is a compression technique that converts a full-precision model to a more compact low-precision version for better storage. Despite the great success of quantization, recent studies revealed the feasibility of malicious exploiting model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors remain dormant in full-precision models but are exposed upon quantization. Unfortunately, existing defenses have limited effects on mitigating QCBs. In this paper, we conduct an in-depth analysis of QCBs. We reveal an intriguing characteristic of QCBs, where activation of backdoor-related neurons on even benign samples enjoy a distribution drift after quantization, although this drift is more significant on poisoned samples. Motivated by this finding, we propose to purify the backdoor-exposed quantized model by aligning its layer-wise activation with its full-precision version. To further exploit the more pronounced activation drifts on poisoned samples, we design an additional module to layer-wisely approximate poisoned activation distribution based on batch normalization statistics of the full-precision model. Extensive experiments are conducted, verifying the effectiveness of our defense. Our code is publicly available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-li24e, title = {Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation}, author = {Li, Boheng and Cai, Yishuo and Cai, Jisong and Li, Yiming and Qiu, Han and Wang, Run and Zhang, Tianwei}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {27439--27456}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24e/li24e.pdf}, url = {https://proceedings.mlr.press/v235/li24e.html}, abstract = {Model quantization is a compression technique that converts a full-precision model to a more compact low-precision version for better storage. Despite the great success of quantization, recent studies revealed the feasibility of malicious exploiting model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors remain dormant in full-precision models but are exposed upon quantization. Unfortunately, existing defenses have limited effects on mitigating QCBs. In this paper, we conduct an in-depth analysis of QCBs. We reveal an intriguing characteristic of QCBs, where activation of backdoor-related neurons on even benign samples enjoy a distribution drift after quantization, although this drift is more significant on poisoned samples. Motivated by this finding, we propose to purify the backdoor-exposed quantized model by aligning its layer-wise activation with its full-precision version. To further exploit the more pronounced activation drifts on poisoned samples, we design an additional module to layer-wisely approximate poisoned activation distribution based on batch normalization statistics of the full-precision model. Extensive experiments are conducted, verifying the effectiveness of our defense. Our code is publicly available.} }
Endnote
%0 Conference Paper %T Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation %A Boheng Li %A Yishuo Cai %A Jisong Cai %A Yiming Li %A Han Qiu %A Run Wang %A Tianwei Zhang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-li24e %I PMLR %P 27439--27456 %U https://proceedings.mlr.press/v235/li24e.html %V 235 %X Model quantization is a compression technique that converts a full-precision model to a more compact low-precision version for better storage. Despite the great success of quantization, recent studies revealed the feasibility of malicious exploiting model quantization via implanting quantization-conditioned backdoors (QCBs). These special backdoors remain dormant in full-precision models but are exposed upon quantization. Unfortunately, existing defenses have limited effects on mitigating QCBs. In this paper, we conduct an in-depth analysis of QCBs. We reveal an intriguing characteristic of QCBs, where activation of backdoor-related neurons on even benign samples enjoy a distribution drift after quantization, although this drift is more significant on poisoned samples. Motivated by this finding, we propose to purify the backdoor-exposed quantized model by aligning its layer-wise activation with its full-precision version. To further exploit the more pronounced activation drifts on poisoned samples, we design an additional module to layer-wisely approximate poisoned activation distribution based on batch normalization statistics of the full-precision model. Extensive experiments are conducted, verifying the effectiveness of our defense. Our code is publicly available.
APA
Li, B., Cai, Y., Cai, J., Li, Y., Qiu, H., Wang, R. & Zhang, T.. (2024). Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:27439-27456 Available from https://proceedings.mlr.press/v235/li24e.html.

Related Material