Neurotoxin: Durable Backdoors in Federated Learning
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:26429-26446, 2022.
Federated learning (FL) systems have an inherent vulnerability to adversarial backdoor attacks during training due to their decentralized nature. The goal of the attacker is to implant backdoors in the learned model with poisoned updates such that at test time, the model’s outputs can be fixed to a given target for certain inputs (e.g., if a user types “people from New York” into a mobile keyboard app that uses a backdoored next word prediction model, the model will autocomplete their sentence to “people in New York are rude”). Prior work has shown that backdoors can be inserted in FL, but these backdoors are not durable: they do not remain in the model after the attacker stops uploading poisoned updates because training continues, and in production FL systems an inserted backdoor may not survive until deployment. We propose Neurotoxin, a simple one-line backdoor attack that functions by attacking parameters that are changed less in magnitude during training. We conduct an exhaustive evaluation across ten natural language processing and computer vision tasks and find that we can double the durability of state of the art backdoors by adding a single line with Neurotoxin.