On the Consistency Rate of Decision Tree Learning Algorithms

Qin-Cheng Zheng, Shen-Huan Lyu, Shao-Qun Zhang, Yuan Jiang, Zhi-Hua Zhou
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7824-7848, 2023.

Abstract

Decision tree learning algorithms such as CART are generally based on heuristics that maximizes the purity gain greedily. Though these algorithms are practically successful, theoretical properties such as consistency are far from clear. In this paper, we discover that the most serious obstacle encumbering consistency analysis for decision tree learning algorithms lies in the fact that the worst-case purity gain, i.e., the core heuristics for tree splitting, can be zero. Based on this recognition, we present a new algorithm, named Grid Classification And Regression Tree (GridCART), with a provable consistency rate $\mathcal{O}(n^{-1 / (d + 2)})$, which is the first consistency rate proved for heuristic tree learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-zheng23b, title = {On the Consistency Rate of Decision Tree Learning Algorithms}, author = {Zheng, Qin-Cheng and Lyu, Shen-Huan and Zhang, Shao-Qun and Jiang, Yuan and Zhou, Zhi-Hua}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7824--7848}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/zheng23b/zheng23b.pdf}, url = {https://proceedings.mlr.press/v206/zheng23b.html}, abstract = {Decision tree learning algorithms such as CART are generally based on heuristics that maximizes the purity gain greedily. Though these algorithms are practically successful, theoretical properties such as consistency are far from clear. In this paper, we discover that the most serious obstacle encumbering consistency analysis for decision tree learning algorithms lies in the fact that the worst-case purity gain, i.e., the core heuristics for tree splitting, can be zero. Based on this recognition, we present a new algorithm, named Grid Classification And Regression Tree (GridCART), with a provable consistency rate $\mathcal{O}(n^{-1 / (d + 2)})$, which is the first consistency rate proved for heuristic tree learning algorithms.} }
Endnote
%0 Conference Paper %T On the Consistency Rate of Decision Tree Learning Algorithms %A Qin-Cheng Zheng %A Shen-Huan Lyu %A Shao-Qun Zhang %A Yuan Jiang %A Zhi-Hua Zhou %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-zheng23b %I PMLR %P 7824--7848 %U https://proceedings.mlr.press/v206/zheng23b.html %V 206 %X Decision tree learning algorithms such as CART are generally based on heuristics that maximizes the purity gain greedily. Though these algorithms are practically successful, theoretical properties such as consistency are far from clear. In this paper, we discover that the most serious obstacle encumbering consistency analysis for decision tree learning algorithms lies in the fact that the worst-case purity gain, i.e., the core heuristics for tree splitting, can be zero. Based on this recognition, we present a new algorithm, named Grid Classification And Regression Tree (GridCART), with a provable consistency rate $\mathcal{O}(n^{-1 / (d + 2)})$, which is the first consistency rate proved for heuristic tree learning algorithms.
APA
Zheng, Q., Lyu, S., Zhang, S., Jiang, Y. & Zhou, Z.. (2023). On the Consistency Rate of Decision Tree Learning Algorithms. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7824-7848 Available from https://proceedings.mlr.press/v206/zheng23b.html.

Related Material