[edit]
PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:147-156, 1997.
Abstract
We consider the problem of concept learning in Valiant’s PAC learning model in which the data used for learning is noisy. Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification. One example of CPCN type noise is data with differing rates of false positives and false negatives. We then show how to learn in the presense of CPCN for any concept class learnable by statistical queries. This set of classes includes every concept class known to be learnable in the presense of standard classification noise. Our model is the first such non-uniform generalization of the standard classification noise model that allows efficient learning of this wide range of concept classes. We then examine standard methods of decision tree induction in the context of noisy data. We observe that the core of commonly used algorithms such as ID3, CART and c4.5 are not robust to CPCN noise, or even to standard classification noise. We therefore propose a simple modification to these algorithms in order to make them robust against CPCN. The modification is based on the statistical query techniques for CPCN described above.