Exploiting the High Predictive Power of Multi-class Subgroups

Tarek Abudawood, Peter Flach
Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:177-192, 2010.

Abstract

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v13-abudawood10a, title = {Exploiting the High Predictive Power of Multi-class Subgroups}, author = {Abudawood, Tarek and Flach, Peter}, booktitle = {Proceedings of 2nd Asian Conference on Machine Learning}, pages = {177--192}, year = {2010}, editor = {Sugiyama, Masashi and Yang, Qiang}, volume = {13}, series = {Proceedings of Machine Learning Research}, address = {Tokyo, Japan}, month = {08--10 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v13/abudawood10a/abudawood10a.pdf}, url = {https://proceedings.mlr.press/v13/abudawood10a.html}, abstract = {Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.} }
Endnote
%0 Conference Paper %T Exploiting the High Predictive Power of Multi-class Subgroups %A Tarek Abudawood %A Peter Flach %B Proceedings of 2nd Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2010 %E Masashi Sugiyama %E Qiang Yang %F pmlr-v13-abudawood10a %I PMLR %P 177--192 %U https://proceedings.mlr.press/v13/abudawood10a.html %V 13 %X Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.
RIS
TY - CPAPER TI - Exploiting the High Predictive Power of Multi-class Subgroups AU - Tarek Abudawood AU - Peter Flach BT - Proceedings of 2nd Asian Conference on Machine Learning DA - 2010/10/31 ED - Masashi Sugiyama ED - Qiang Yang ID - pmlr-v13-abudawood10a PB - PMLR DP - Proceedings of Machine Learning Research VL - 13 SP - 177 EP - 192 L1 - http://proceedings.mlr.press/v13/abudawood10a/abudawood10a.pdf UR - https://proceedings.mlr.press/v13/abudawood10a.html AB - Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets. ER -
APA
Abudawood, T. & Flach, P.. (2010). Exploiting the High Predictive Power of Multi-class Subgroups. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:177-192 Available from https://proceedings.mlr.press/v13/abudawood10a.html.

Related Material