An Exact Probability Metric for Decision Tree Splitting and Stopping

J. Kent Martin
Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, PMLR R0:379-385, 1995.

Abstract

ID3’s information gain heuristic [16] is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5 [20]. Several alternatives have been proposed, notably orthogonality [9], and Beta [5]. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules in CART [3, pp 59-66] and ID3 [20, pp 36-37] are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Empirical results show that pre-pruning should be done, as trees pruned in this way are simpler, more efficient, and no less accurate than unpruned trees. Average training time is reduced by up to $30 %$, and expensive post-pruning avoided.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR0-martin95b, title = {An Exact Probability Metric for Decision Tree Splitting and Stopping}, author = {Martin, J. Kent}, booktitle = {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics}, pages = {379--385}, year = {1995}, editor = {Fisher, Doug and Lenz, Hans-Joachim}, volume = {R0}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/r0/martin95b/martin95b.pdf}, url = {https://proceedings.mlr.press/r0/martin95b.html}, abstract = {ID3’s information gain heuristic [16] is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5 [20]. Several alternatives have been proposed, notably orthogonality [9], and Beta [5]. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules in CART [3, pp 59-66] and ID3 [20, pp 36-37] are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Empirical results show that pre-pruning should be done, as trees pruned in this way are simpler, more efficient, and no less accurate than unpruned trees. Average training time is reduced by up to $30 %$, and expensive post-pruning avoided.}, note = {Reissued by PMLR on 01 May 2022.} }
Endnote
%0 Conference Paper %T An Exact Probability Metric for Decision Tree Splitting and Stopping %A J. Kent Martin %B Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1995 %E Doug Fisher %E Hans-Joachim Lenz %F pmlr-vR0-martin95b %I PMLR %P 379--385 %U https://proceedings.mlr.press/r0/martin95b.html %V R0 %X ID3’s information gain heuristic [16] is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5 [20]. Several alternatives have been proposed, notably orthogonality [9], and Beta [5]. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chi-squared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules in CART [3, pp 59-66] and ID3 [20, pp 36-37] are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Empirical results show that pre-pruning should be done, as trees pruned in this way are simpler, more efficient, and no less accurate than unpruned trees. Average training time is reduced by up to $30 %$, and expensive post-pruning avoided. %Z Reissued by PMLR on 01 May 2022.
APA
Martin, J.K.. (1995). An Exact Probability Metric for Decision Tree Splitting and Stopping. Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R0:379-385 Available from https://proceedings.mlr.press/r0/martin95b.html. Reissued by PMLR on 01 May 2022.

Related Material