SPECTRE: defending against backdoor attacks using robust statistics

Jonathan Hayase, Weihao Kong, Raghav Somani, Sewoong Oh
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4129-4139, 2021.

Abstract

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly frightening scenario is when a small fraction of corrupted data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There has been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these methods require a significant fraction of the data to be corrupted, in order to have strong enough signal for detection. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense is able to completely remove backdoors whenever the benchmark backdoor attacks are successful, even in regimes where previous methods have no hope for detecting poisoned examples.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-hayase21a, title = {SPECTRE: defending against backdoor attacks using robust statistics}, author = {Hayase, Jonathan and Kong, Weihao and Somani, Raghav and Oh, Sewoong}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {4129--4139}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/hayase21a/hayase21a.pdf}, url = {https://proceedings.mlr.press/v139/hayase21a.html}, abstract = {Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly frightening scenario is when a small fraction of corrupted data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There has been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these methods require a significant fraction of the data to be corrupted, in order to have strong enough signal for detection. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense is able to completely remove backdoors whenever the benchmark backdoor attacks are successful, even in regimes where previous methods have no hope for detecting poisoned examples.} }
Endnote
%0 Conference Paper %T SPECTRE: defending against backdoor attacks using robust statistics %A Jonathan Hayase %A Weihao Kong %A Raghav Somani %A Sewoong Oh %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-hayase21a %I PMLR %P 4129--4139 %U https://proceedings.mlr.press/v139/hayase21a.html %V 139 %X Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly frightening scenario is when a small fraction of corrupted data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There has been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these methods require a significant fraction of the data to be corrupted, in order to have strong enough signal for detection. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense is able to completely remove backdoors whenever the benchmark backdoor attacks are successful, even in regimes where previous methods have no hope for detecting poisoned examples.
APA
Hayase, J., Kong, W., Somani, R. & Oh, S.. (2021). SPECTRE: defending against backdoor attacks using robust statistics. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4129-4139 Available from https://proceedings.mlr.press/v139/hayase21a.html.

Related Material