Is Feature Selection Secure against Training Data Poisoning?

Huang Xiao; Battista Biggio; Gavin Brown; Giorgio Fumera; Claudia Eckert; Fabio Roli

Is Feature Selection Secure against Training Data Poisoning?

Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, Fabio Roli

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1689-1698, 2015.

Abstract

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-xiao15,
  title = 	 {Is Feature Selection Secure against Training Data Poisoning?},
  author = 	 {Xiao, Huang and Biggio, Battista and Brown, Gavin and Fumera, Giorgio and Eckert, Claudia and Roli, Fabio},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1689--1698},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/xiao15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/xiao15.html},
  abstract = 	 {Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.}
}

Endnote

%0 Conference Paper
%T Is Feature Selection Secure against Training Data Poisoning?
%A Huang Xiao
%A Battista Biggio
%A Gavin Brown
%A Giorgio Fumera
%A Claudia Eckert
%A Fabio Roli
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-xiao15
%I PMLR
%P 1689--1698
%U https://proceedings.mlr.press/v37/xiao15.html
%V 37
%X Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

RIS


TY  - CPAPER
TI  - Is Feature Selection Secure against Training Data Poisoning?
AU  - Huang Xiao
AU  - Battista Biggio
AU  - Gavin Brown
AU  - Giorgio Fumera
AU  - Claudia Eckert
AU  - Fabio Roli
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-xiao15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1689
EP  - 1698
L1  - http://proceedings.mlr.press/v37/xiao15.pdf
UR  - https://proceedings.mlr.press/v37/xiao15.html
AB  - Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.
ER  -

APA


Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C. & Roli, F.. (2015). Is Feature Selection Secure against Training Data Poisoning?. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1689-1698 Available from https://proceedings.mlr.press/v37/xiao15.html.

Related Material

Download PDF