The Security of Latent Dirichlet Allocation

Shike Mei; Xiaojin Zhu

The Security of Latent Dirichlet Allocation

Shike Mei, Xiaojin Zhu

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:681-689, 2015.

Abstract

Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.

Cite this Paper

BibTeX


@InProceedings{pmlr-v38-mei15,
  title = 	 {{The Security of Latent Dirichlet Allocation}},
  author = 	 {Mei, Shike and Zhu, Xiaojin},
  booktitle = 	 {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {681--689},
  year = 	 {2015},
  editor = 	 {Lebanon, Guy and Vishwanathan, S. V. N.},
  volume = 	 {38},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {San Diego, California, USA},
  month = 	 {09--12 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v38/mei15.pdf},
  url = 	 {https://proceedings.mlr.press/v38/mei15.html},
  abstract = 	 {Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.}
}

Endnote

%0 Conference Paper
%T The Security of Latent Dirichlet Allocation
%A Shike Mei
%A Xiaojin Zhu
%B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2015
%E Guy Lebanon
%E S. V. N. Vishwanathan	
%F pmlr-v38-mei15
%I PMLR
%P 681--689
%U https://proceedings.mlr.press/v38/mei15.html
%V 38
%X Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.

RIS


TY  - CPAPER
TI  - The Security of Latent Dirichlet Allocation
AU  - Shike Mei
AU  - Xiaojin Zhu
BT  - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
DA  - 2015/02/21
ED  - Guy Lebanon
ED  - S. V. N. Vishwanathan	
ID  - pmlr-v38-mei15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 38
SP  - 681
EP  - 689
L1  - http://proceedings.mlr.press/v38/mei15.pdf
UR  - https://proceedings.mlr.press/v38/mei15.html
AB  - Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.
ER  -

APA


Mei, S. & Zhu, X.. (2015). The Security of Latent Dirichlet Allocation. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:681-689 Available from https://proceedings.mlr.press/v38/mei15.html.

The Security of Latent Dirichlet Allocation

Abstract

Cite this Paper

Related Material