Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees

Mathieu Serrurier; Henri Prade

Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees

Mathieu Serrurier, Henri Prade

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1576-1584, 2015.

Abstract

Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is well-suited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-serrurier15,
  title = 	 {Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees},
  author = 	 {Serrurier, Mathieu and Prade, Henri},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1576--1584},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/serrurier15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/serrurier15.html},
  abstract = 	 {Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is well-suited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.}
}

Endnote

%0 Conference Paper
%T Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees
%A Mathieu Serrurier
%A Henri Prade
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-serrurier15
%I PMLR
%P 1576--1584
%U https://proceedings.mlr.press/v37/serrurier15.html
%V 37
%X Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is well-suited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.

RIS


TY  - CPAPER
TI  - Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees
AU  - Mathieu Serrurier
AU  - Henri Prade
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-serrurier15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1576
EP  - 1584
L1  - http://proceedings.mlr.press/v37/serrurier15.pdf
UR  - https://proceedings.mlr.press/v37/serrurier15.html
AB  - Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is well-suited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.
ER  -

APA


Serrurier, M. & Prade, H.. (2015). Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1576-1584 Available from https://proceedings.mlr.press/v37/serrurier15.html.

Related Material

Download PDF