Accurate Robust and Efficient Error Estimation for Decision Trees

Lixin Fan
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:239-247, 2016.

Abstract

This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-fan16, title = {Accurate Robust and Efficient Error Estimation for Decision Trees}, author = {Fan, Lixin}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {239--247}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/fan16.pdf}, url = { http://proceedings.mlr.press/v48/fan16.html }, abstract = {This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.} }
Endnote
%0 Conference Paper %T Accurate Robust and Efficient Error Estimation for Decision Trees %A Lixin Fan %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-fan16 %I PMLR %P 239--247 %U http://proceedings.mlr.press/v48/fan16.html %V 48 %X This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.
RIS
TY - CPAPER TI - Accurate Robust and Efficient Error Estimation for Decision Trees AU - Lixin Fan BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-fan16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 239 EP - 247 L1 - http://proceedings.mlr.press/v48/fan16.pdf UR - http://proceedings.mlr.press/v48/fan16.html AB - This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods. ER -
APA
Fan, L.. (2016). Accurate Robust and Efficient Error Estimation for Decision Trees. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:239-247 Available from http://proceedings.mlr.press/v48/fan16.html .

Related Material