Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism

Si-Chi Chin; W. Nick Street

Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism

Si-Chi Chin, W. Nick Street

Proceedings of ICML Workshop on Unsupervised and Transfer Learning, PMLR 27:133-144, 2012.

Abstract

The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v27-chin12a,
  title = 	 {Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism},
  author = 	 {Chin, Si-Chi and Street, W. Nick},
  booktitle = 	 {Proceedings of ICML Workshop on Unsupervised and Transfer Learning},
  pages = 	 {133--144},
  year = 	 {2012},
  editor = 	 {Guyon, Isabelle and Dror, Gideon and Lemaire, Vincent and Taylor, Graham and Silver, Daniel},
  volume = 	 {27},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bellevue, Washington, USA},
  month = 	 {02 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v27/chin12a/chin12a.pdf},
  url = 	 {https://proceedings.mlr.press/v27/chin12a.html},
  abstract = 	 {The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.}
}

Endnote

%0 Conference Paper
%T Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism
%A Si-Chi Chin
%A W. Nick Street
%B Proceedings of ICML Workshop on Unsupervised and Transfer Learning
%C Proceedings of Machine Learning Research
%D 2012
%E Isabelle Guyon
%E Gideon Dror
%E Vincent Lemaire
%E Graham Taylor
%E Daniel Silver	
%F pmlr-v27-chin12a
%I PMLR
%P 133--144
%U https://proceedings.mlr.press/v27/chin12a.html
%V 27
%X The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.

RIS


TY  - CPAPER
TI  - Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism
AU  - Si-Chi Chin
AU  - W. Nick Street
BT  - Proceedings of ICML Workshop on Unsupervised and Transfer Learning
DA  - 2012/06/27
ED  - Isabelle Guyon
ED  - Gideon Dror
ED  - Vincent Lemaire
ED  - Graham Taylor
ED  - Daniel Silver	
ID  - pmlr-v27-chin12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 27
SP  - 133
EP  - 144
L1  - http://proceedings.mlr.press/v27/chin12a/chin12a.pdf
UR  - https://proceedings.mlr.press/v27/chin12a.html
AB  - The paper applies knowledge transfer methods to the problem of detecting Wikipedia vandalism detection, defined as malicious editing intended to compromise the integrity of the content of articles. A major challenge of detecting Wikipedia vandalism is the lack of a large amount of labeled training data. Knowledge transfer addresses this challenge by leveraging previously acquired knowledge from a source task. However, the characteristics of Wikipedia vandalism are heterogeneous, ranging from a small replacement of a letter to a massive deletion of text. Selecting an informative subset from the source task to avoid potential negative transfer becomes a primary concern given this heterogeneous nature. The paper explores knowledge transfer methods to generalize learned models from a heterogeneous dataset to a more uniform dataset while avoiding negative transfer. The two novel segmented transfer (ST) approaches map unlabeled data from the target task to the most related cluster from the source task, classifying the unlabeled data using the most relevant learned models.
ER  -

APA


Chin, S. & Street, W.N.. (2012). Divide and Transfer: an Exploration of Segmented Transfer to Detect Wikipedia Vandalism. Proceedings of ICML Workshop on Unsupervised and Transfer Learning, in Proceedings of Machine Learning Research 27:133-144 Available from https://proceedings.mlr.press/v27/chin12a.html.

Related Material

Download PDF