[edit]
Adjusting for Multiple Testing in Decision Tree Pruning
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:295-302, 1997.
Abstract
Overfitting is a widely observed pathology of induction algorithms. Overfitted models contain unnecessary structure that reflects nothing more than random variation in the data sample used to construct the model. Portions of these models are literally wrong, and can mislead users. Overfitted models are less efficient to store and use than their correctly-sized counterparts. Finally, overfitting can reduce the accuracy of induced models on new data [14, 7]. For induction algorithms that build decision trees [1, 13, 15], pruning is a common approach to correct overfitting. Pruning techniques take an induced tree, examine individual subtrees, and remove those subtrees deemed to be unnecessary. While pruning techniques can differ in several respects, they primarily differ in the criterion used to judge subtrees. Many criteria have been proposed, including statistical significance tests [13], corrected error estimates [15], and minimum description length calculations [12]. Most common pruning techniques, however, do not account for one potentially important factor - multiple testing. Multiple testing occurs whenever an induction algorithm examines several candidate models and selects the one that best accords with the data. Any search process necessarily involves multiple testing, and most common induction algorithms involve implicit or explicit search through a space of candidate models. In the case of decision trees, search involves examining many possible subtrees and selecting the best one. Pruning techniques need to account for the number of subtrees examined, because such multiple testing affects the apparent accuracy of models on training data [8]. This paper examines the importance of adjusting for multiple testing. Specifically, it examines the effectiveness of one particular pruning method - \emph{bonferroni pruning}. Bonferroni pruning adjusts the results of a standard significance test to account for the number of subtrees examined at a particular node of a decision tree. Evidence that bonferroni pruning leads to better models supports the hypothesis that multiple testing is an important cause of overfitting.