Exploiting the High Predictive Power of Multi-class Subgroups

Tarek Abudawood; Peter Flach

Exploiting the High Predictive Power of Multi-class Subgroups

Tarek Abudawood, Peter Flach

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:177-192, 2010.

Abstract

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v13-abudawood10a,
  title = 	 {Exploiting the High Predictive Power of Multi-class Subgroups},
  author = 	 {Abudawood, Tarek and Flach, Peter},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {177--192},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/abudawood10a/abudawood10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/abudawood10a.html},
  abstract = 	 {Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.}
}

Endnote

%0 Conference Paper
%T Exploiting the High Predictive Power of Multi-class Subgroups
%A Tarek Abudawood
%A Peter Flach
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-abudawood10a
%I PMLR
%P 177--192
%U https://proceedings.mlr.press/v13/abudawood10a.html
%V 13
%X Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.

RIS


TY  - CPAPER
TI  - Exploiting the High Predictive Power of Multi-class Subgroups
AU  - Tarek Abudawood
AU  - Peter Flach
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-abudawood10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 177
EP  - 192
L1  - http://proceedings.mlr.press/v13/abudawood10a/abudawood10a.pdf
UR  - https://proceedings.mlr.press/v13/abudawood10a.html
AB  - Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.
ER  -

APA


Abudawood, T. & Flach, P.. (2010). Exploiting the High Predictive Power of Multi-class Subgroups. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:177-192 Available from https://proceedings.mlr.press/v13/abudawood10a.html.

Related Material

Download PDF