PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction

Scott E. Decatur
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:147-156, 1997.

Abstract

We consider the problem of concept learning in Valiant’s PAC learning model in which the data used for learning is noisy. Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification. One example of CPCN type noise is data with differing rates of false positives and false negatives. We then show how to learn in the presense of CPCN for any concept class learnable by statistical queries. This set of classes includes every concept class known to be learnable in the presense of standard classification noise. Our model is the first such non-uniform generalization of the standard classification noise model that allows efficient learning of this wide range of concept classes. We then examine standard methods of decision tree induction in the context of noisy data. We observe that the core of commonly used algorithms such as ID3, CART and c4.5 are not robust to CPCN noise, or even to standard classification noise. We therefore propose a simple modification to these algorithms in order to make them robust against CPCN. The modification is based on the statistical query techniques for CPCN described above.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-decatur97a, title = {PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction}, author = {Decatur, Scott E.}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {147--156}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/decatur97a/decatur97a.pdf}, url = {https://proceedings.mlr.press/r1/decatur97a.html}, abstract = {We consider the problem of concept learning in Valiant’s PAC learning model in which the data used for learning is noisy. Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification. One example of CPCN type noise is data with differing rates of false positives and false negatives. We then show how to learn in the presense of CPCN for any concept class learnable by statistical queries. This set of classes includes every concept class known to be learnable in the presense of standard classification noise. Our model is the first such non-uniform generalization of the standard classification noise model that allows efficient learning of this wide range of concept classes. We then examine standard methods of decision tree induction in the context of noisy data. We observe that the core of commonly used algorithms such as ID3, CART and c4.5 are not robust to CPCN noise, or even to standard classification noise. We therefore propose a simple modification to these algorithms in order to make them robust against CPCN. The modification is based on the statistical query techniques for CPCN described above.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction %A Scott E. Decatur %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-decatur97a %I PMLR %P 147--156 %U https://proceedings.mlr.press/r1/decatur97a.html %V R1 %X We consider the problem of concept learning in Valiant’s PAC learning model in which the data used for learning is noisy. Specifically, we introduce a new model of noise called \emph{constant-partition classification noise} (CPCN) which generalizes the standard model of classification noise to allow different examples to have different rates of random misclassification. One example of CPCN type noise is data with differing rates of false positives and false negatives. We then show how to learn in the presense of CPCN for any concept class learnable by statistical queries. This set of classes includes every concept class known to be learnable in the presense of standard classification noise. Our model is the first such non-uniform generalization of the standard classification noise model that allows efficient learning of this wide range of concept classes. We then examine standard methods of decision tree induction in the context of noisy data. We observe that the core of commonly used algorithms such as ID3, CART and c4.5 are not robust to CPCN noise, or even to standard classification noise. We therefore propose a simple modification to these algorithms in order to make them robust against CPCN. The modification is based on the statistical query techniques for CPCN described above. %Z Reissued by PMLR on 30 March 2021.
APA
Decatur, S.E.. (1997). PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:147-156 Available from https://proceedings.mlr.press/r1/decatur97a.html. Reissued by PMLR on 30 March 2021.

Related Material