Accurate Robust and Efficient Error Estimation for Decision Trees

Lixin Fan

Accurate Robust and Efficient Error Estimation for Decision Trees

Lixin Fan

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:239-247, 2016.

Abstract

This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-fan16,
  title = 	 {Accurate Robust and Efficient Error Estimation for Decision Trees},
  author = 	 {Fan, Lixin},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {239--247},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/fan16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/fan16.html},
  abstract = 	 {This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.}
}

Endnote

%0 Conference Paper
%T Accurate Robust and Efficient Error Estimation for Decision Trees
%A Lixin Fan
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-fan16
%I PMLR
%P 239--247
%U https://proceedings.mlr.press/v48/fan16.html
%V 48
%X This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.

RIS


TY  - CPAPER
TI  - Accurate Robust and Efficient Error Estimation for Decision Trees
AU  - Lixin Fan
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-fan16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 239
EP  - 247
L1  - http://proceedings.mlr.press/v48/fan16.pdf
UR  - https://proceedings.mlr.press/v48/fan16.html
AB  - This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.
ER  -

APA


Fan, L.. (2016). Accurate Robust and Efficient Error Estimation for Decision Trees. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:239-247 Available from https://proceedings.mlr.press/v48/fan16.html.

Related Material

Download PDF