Statistical Preprocessing for Decision Tree Induction

Sreerama K. Murthy

Statistical Preprocessing for Decision Tree Induction

Sreerama K. Murthy

Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:403-409, 1995.

Abstract

Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.

Cite this Paper

BibTeX


@InProceedings{pmlr-vR0-murthy95a,
  title = 	 {Statistical Preprocessing for Decision Tree Induction},
  author =       {Murthy, Sreerama K.},
  booktitle = 	 {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics},
  pages = 	 {403--409},
  year = 	 {1995},
  editor = 	 {Fisher, Doug and Lenz, Hans-Joachim},
  volume = 	 {R0},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {04--07 Jan},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/r0/murthy95a/murthy95a.pdf},
  url = 	 {https://proceedings.mlr.press/r0/murthy95a.html},
  abstract = 	 {Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.},
  note =         {Reissued by PMLR on 01 May 2022.}
}

Endnote

%0 Conference Paper
%T Statistical Preprocessing for Decision Tree Induction
%A Sreerama K. Murthy
%B Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 1995
%E Doug Fisher
%E Hans-Joachim Lenz	
%F pmlr-vR0-murthy95a
%I PMLR
%P 403--409
%U https://proceedings.mlr.press/r0/murthy95a.html
%V R0
%X Some apparently simple numeric data sets cause significant problems for existing decision tree induction algorithms, in that no method is able to find a small, accurate tree, even though one exists. One source of this difficulty is the goodness measures used to decide whether a particular node represents a good way to split the data. This paper points out that the commonly-used goodness measures are not equipped to take into account some patterns in numeric attribute spaces, and presents a framework for capturing some such patterns into decision tree induction. As a case study, it is demonstrated empirically that supervised clustering, when used as a preprocessing step, can improve the quality of both univariate and multivariate decision trees.
%Z Reissued by PMLR on 01 May 2022.

APA


Murthy, S.K.. (1995). Statistical Preprocessing for Decision Tree Induction. Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R0:403-409 Available from https://proceedings.mlr.press/r0/murthy95a.html. Reissued by PMLR on 01 May 2022.

Related Material

Download PDF