Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies

Xia Cui; Frans Coenen; Danushka Bollegala

Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies

Xia Cui, Frans Coenen, Danushka Bollegala

Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 74:103-115, 2017.

Abstract

Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS). Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain. Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with F-scores for selecting pivots in the source labelled data produces the best results.

Cite this Paper

BibTeX

@InProceedings{pmlr-v74-cui17a,
  title = 	 {Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies},
  author = 	 {Cui, Xia and Coenen, Frans and Bollegala, Danushka},
  booktitle = 	 {Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications},
  pages = 	 {103--115},
  year = 	 {2017},
  editor = 	 {Luís Torgo, Paula Branco and Moniz, Nuno},
  volume = 	 {74},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v74/cui17a/cui17a.pdf},
  url = 	 {https://proceedings.mlr.press/v74/cui17a.html},
  abstract = 	 {Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS).  Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain.  Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with F-scores for selecting pivots in the source labelled data produces the best results.}
}

Endnote

%0 Conference Paper
%T Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies
%A Xia Cui
%A Frans Coenen
%A Danushka Bollegala
%B Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications
%C Proceedings of Machine Learning Research
%D 2017
%E Paula Branco Luís Torgo
%E Nuno Moniz	
%F pmlr-v74-cui17a
%I PMLR
%P 103--115
%U https://proceedings.mlr.press/v74/cui17a.html
%V 74
%X Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS).  Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain.  Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with F-scores for selecting pivots in the source labelled data produces the best results.

APA

Cui, X., Coenen, F. & Bollegala, D.. (2017). Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 74:103-115 Available from https://proceedings.mlr.press/v74/cui17a.html.

Related Material

Download PDF