Statistical Preprocessing for Decision Tree Induction

Sreerama K. Murthy
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:403-409, 1995.

Abstract

Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR0-murthy95a, title = {Statistical Preprocessing for Decision Tree Induction}, author = {Murthy, Sreerama K.}, booktitle = {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics}, pages = {403--409}, year = {1995}, editor = {Fisher, Doug and Lenz, Hans-Joachim}, volume = {R0}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/r0/murthy95a/murthy95a.pdf}, url = {https://proceedings.mlr.press/r0/murthy95a.html}, abstract = {Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.}, note = {Reissued by PMLR on 01 May 2022.} }
Endnote
%0 Conference Paper %T Statistical Preprocessing for Decision Tree Induction %A Sreerama K. Murthy %B Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1995 %E Doug Fisher %E Hans-Joachim Lenz %F pmlr-vR0-murthy95a %I PMLR %P 403--409 %U https://proceedings.mlr.press/r0/murthy95a.html %V R0 %X Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees. %Z Reissued by PMLR on 01 May 2022.
APA
Murthy, S.K.. (1995). Statistical Preprocessing for Decision Tree Induction. Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R0:403-409 Available from https://proceedings.mlr.press/r0/murthy95a.html. Reissued by PMLR on 01 May 2022.

Related Material