Data Poisoning Attacks on Stochastic Bandits

Fang Liu; Ness Shroff

Data Poisoning Attacks on Stochastic Bandits

Fang Liu, Ness Shroff

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4042-4050, 2019.

Abstract

Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others. Even though potential attacks against these learning algorithms may hijack their behavior, causing catastrophic loss in real-world applications, little is known about adversarial attacks on bandit algorithms. In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms. We show that the attacker can force the bandit algorithm to pull a target arm with high probability by a slight manipulation of the rewards in the data. Then we study a form of online attacks on bandit algorithms and propose an adaptive attack strategy against any bandit algorithm without the knowledge of the bandit algorithm. Our adaptive attack strategy can hijack the behavior of the bandit algorithm to suffer a linear regret with only a logarithmic cost to the attacker. Our results demonstrate a significant security threat to stochastic bandits.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-liu19e,
  title = 	 {Data Poisoning Attacks on Stochastic Bandits},
  author =       {Liu, Fang and Shroff, Ness},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {4042--4050},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/liu19e/liu19e.pdf},
  url = 	 {https://proceedings.mlr.press/v97/liu19e.html},
  abstract = 	 {Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others. Even though potential attacks against these learning algorithms may hijack their behavior, causing catastrophic loss in real-world applications, little is known about adversarial attacks on bandit algorithms. In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms. We show that the attacker can force the bandit algorithm to pull a target arm with high probability by a slight manipulation of the rewards in the data. Then we study a form of online attacks on bandit algorithms and propose an adaptive attack strategy against any bandit algorithm without the knowledge of the bandit algorithm. Our adaptive attack strategy can hijack the behavior of the bandit algorithm to suffer a linear regret with only a logarithmic cost to the attacker. Our results demonstrate a significant security threat to stochastic bandits.}
}

Endnote

%0 Conference Paper
%T Data Poisoning Attacks on Stochastic Bandits
%A Fang Liu
%A Ness Shroff
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-liu19e
%I PMLR
%P 4042--4050
%U https://proceedings.mlr.press/v97/liu19e.html
%V 97
%X Stochastic multi-armed bandits form a class of online learning problems that have important applications in online recommendation systems, adaptive medical treatment, and many others. Even though potential attacks against these learning algorithms may hijack their behavior, causing catastrophic loss in real-world applications, little is known about adversarial attacks on bandit algorithms. In this paper, we propose a framework of offline attacks on bandit algorithms and study convex optimization based attacks on several popular bandit algorithms. We show that the attacker can force the bandit algorithm to pull a target arm with high probability by a slight manipulation of the rewards in the data. Then we study a form of online attacks on bandit algorithms and propose an adaptive attack strategy against any bandit algorithm without the knowledge of the bandit algorithm. Our adaptive attack strategy can hijack the behavior of the bandit algorithm to suffer a linear regret with only a logarithmic cost to the attacker. Our results demonstrate a significant security threat to stochastic bandits.

APA

Liu, F. & Shroff, N.. (2019). Data Poisoning Attacks on Stochastic Bandits. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4042-4050 Available from https://proceedings.mlr.press/v97/liu19e.html.

Data Poisoning Attacks on Stochastic Bandits

Abstract

Cite this Paper

Related Material