Making Classifier Chains Resilient to Class Imbalance

Bin Liu; Grigorios Tsoumakas

Making Classifier Chains Resilient to Class Imbalance

Bin Liu, Grigorios Tsoumakas

Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:280-295, 2018.

Abstract

Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.

Cite this Paper

BibTeX

@InProceedings{pmlr-v95-liu18c,
  title = 	 {Making Classifier Chains Resilient to Class Imbalance},
  author =       {Liu, Bin and Tsoumakas, Grigorios},
  booktitle = 	 {Proceedings of The 10th Asian Conference on Machine Learning},
  pages = 	 {280--295},
  year = 	 {2018},
  editor = 	 {Zhu, Jun and Takeuchi, Ichiro},
  volume = 	 {95},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {14--16 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v95/liu18c/liu18c.pdf},
  url = 	 {https://proceedings.mlr.press/v95/liu18c.html},
  abstract = 	 {Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions  of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.}
}

Endnote

%0 Conference Paper
%T Making Classifier Chains Resilient to Class Imbalance
%A Bin Liu
%A Grigorios Tsoumakas
%B Proceedings of The 10th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jun Zhu
%E Ichiro Takeuchi	
%F pmlr-v95-liu18c
%I PMLR
%P 280--295
%U https://proceedings.mlr.press/v95/liu18c.html
%V 95
%X Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions  of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.

APA

Liu, B. & Tsoumakas, G.. (2018). Making Classifier Chains Resilient to Class Imbalance. Proceedings of The 10th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 95:280-295 Available from https://proceedings.mlr.press/v95/liu18c.html.

Related Material

Download PDF