Understanding Backdoor Attacks through the Adaptability Hypothesis

Xun Xian, Ganghua Wang, Jayanth Srinivasa, Ashish Kundu, Xuan Bi, Mingyi Hong, Jie Ding
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:37952-37976, 2023.

Abstract

A poisoning backdoor attack is a rising security concern for deep learning. This type of attack can result in the backdoored model functioning normally most of the time but exhibiting abnormal behavior when presented with inputs containing the backdoor trigger, making it difficult to detect and prevent. In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models. The adaptability hypothesis postulates that for an effective attack, the effect of incorporating a new dataset on the predictions of the original data points will be small, provided that the original data points are distant from the new dataset. Experiments on benchmark image datasets and state-of-the-art backdoor attacks for deep neural networks are conducted to corroborate the hypothesis. Our finding provides insight into the factors that affect the attack’s effectiveness and has implications for the design of future attacks and defenses.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-xian23a, title = {Understanding Backdoor Attacks through the Adaptability Hypothesis}, author = {Xian, Xun and Wang, Ganghua and Srinivasa, Jayanth and Kundu, Ashish and Bi, Xuan and Hong, Mingyi and Ding, Jie}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {37952--37976}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/xian23a/xian23a.pdf}, url = {https://proceedings.mlr.press/v202/xian23a.html}, abstract = {A poisoning backdoor attack is a rising security concern for deep learning. This type of attack can result in the backdoored model functioning normally most of the time but exhibiting abnormal behavior when presented with inputs containing the backdoor trigger, making it difficult to detect and prevent. In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models. The adaptability hypothesis postulates that for an effective attack, the effect of incorporating a new dataset on the predictions of the original data points will be small, provided that the original data points are distant from the new dataset. Experiments on benchmark image datasets and state-of-the-art backdoor attacks for deep neural networks are conducted to corroborate the hypothesis. Our finding provides insight into the factors that affect the attack’s effectiveness and has implications for the design of future attacks and defenses.} }
Endnote
%0 Conference Paper %T Understanding Backdoor Attacks through the Adaptability Hypothesis %A Xun Xian %A Ganghua Wang %A Jayanth Srinivasa %A Ashish Kundu %A Xuan Bi %A Mingyi Hong %A Jie Ding %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-xian23a %I PMLR %P 37952--37976 %U https://proceedings.mlr.press/v202/xian23a.html %V 202 %X A poisoning backdoor attack is a rising security concern for deep learning. This type of attack can result in the backdoored model functioning normally most of the time but exhibiting abnormal behavior when presented with inputs containing the backdoor trigger, making it difficult to detect and prevent. In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models. The adaptability hypothesis postulates that for an effective attack, the effect of incorporating a new dataset on the predictions of the original data points will be small, provided that the original data points are distant from the new dataset. Experiments on benchmark image datasets and state-of-the-art backdoor attacks for deep neural networks are conducted to corroborate the hypothesis. Our finding provides insight into the factors that affect the attack’s effectiveness and has implications for the design of future attacks and defenses.
APA
Xian, X., Wang, G., Srinivasa, J., Kundu, A., Bi, X., Hong, M. & Ding, J.. (2023). Understanding Backdoor Attacks through the Adaptability Hypothesis. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:37952-37976 Available from https://proceedings.mlr.press/v202/xian23a.html.

Related Material