Binary Partitions with Approximate Minimum Impurity

Eduardo Laber, Marco Molinaro, Felipe Mello Pereira
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2854-2862, 2018.

Abstract

The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-laber18a, title = {Binary Partitions with Approximate Minimum Impurity}, author = {Laber, Eduardo and Molinaro, Marco and Pereira, Felipe Mello}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {2854--2862}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/laber18a/laber18a.pdf}, url = {https://proceedings.mlr.press/v80/laber18a.html}, abstract = {The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.} }
Endnote
%0 Conference Paper %T Binary Partitions with Approximate Minimum Impurity %A Eduardo Laber %A Marco Molinaro %A Felipe Mello Pereira %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-laber18a %I PMLR %P 2854--2862 %U https://proceedings.mlr.press/v80/laber18a.html %V 80 %X The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.
APA
Laber, E., Molinaro, M. & Pereira, F.M.. (2018). Binary Partitions with Approximate Minimum Impurity. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:2854-2862 Available from https://proceedings.mlr.press/v80/laber18a.html.

Related Material