Small sample inference for generalization error in classification using the CUD bound

Eric B. Laber, Susan A. Murphy
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, PMLR R6:357-365, 2008.

Abstract

Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the re-sampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996, Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR6-laber08a, title = {Small sample inference for generalization error in classification using the CUD bound}, author = {Laber, Eric B. and Murphy, Susan A.}, booktitle = {Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence}, pages = {357--365}, year = {2008}, editor = {McAllester, David A. and Myllymäki, Petri}, volume = {R6}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/r6/main/assets/laber08a/laber08a.pdf}, url = {https://proceedings.mlr.press/r6/laber08a.html}, abstract = {Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the re-sampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996, Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets.}, note = {Reissued by PMLR on 09 October 2024.} }
Endnote
%0 Conference Paper %T Small sample inference for generalization error in classification using the CUD bound %A Eric B. Laber %A Susan A. Murphy %B Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2008 %E David A. McAllester %E Petri Myllymäki %F pmlr-vR6-laber08a %I PMLR %P 357--365 %U https://proceedings.mlr.press/r6/laber08a.html %V R6 %X Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the re-sampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996, Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets. %Z Reissued by PMLR on 09 October 2024.
APA
Laber, E.B. & Murphy, S.A.. (2008). Small sample inference for generalization error in classification using the CUD bound. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research R6:357-365 Available from https://proceedings.mlr.press/r6/laber08a.html. Reissued by PMLR on 09 October 2024.

Related Material