On the Error Probability of Model Selection for Classification

Joe Suzuki
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:513-520, 1997.

Abstract

We consider model selection based on information criteria for classification. The information criterion is expressed in the form of the empirical entropy plus a compensation term $(k(g)/2)d(n)$, where $k(g)$ is the number of independent parameters in a model $g$, $d(n)$ is a function of $n$, and $n$ is the number of examples. First of all, we derive for arbitrary $d(\cdot)$ the asymptotically exact error probabilities in model selection. Although it was known for linear/autoregression processes that $d(n) = \log \log n$ is the minimum function of n such that the model selection satisfies strong consistency, the problem whether the same thing holds for classification has been open. We solve this problem affirmatively. Additionally, we derive for arbitrary $d(\cdot)$ the expected Kullback-leibler divergence between a true conditional probability and the conditional probability estimated by the model selection and the Laplace estimators. The derived value is $k(g^*)/(2n)$, where $g^*$ is a true model, and the accumulated value over $n$ time instances is $(k(g*)/2) \log n + 0(1)$, which implies the optimality of a predictive coding based on the model selection. Keywords: model selection, error probability, strong con- sistency, Kullback-Leibler divergence, minimum description length principle, Hannan/Quinn’s procedure, unseparated/separated models, Kolmogorov’s law of the iterated logarithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR1-suzuki97a, title = {On the Error Probability of Model Selection for Classification}, author = {Suzuki, Joe}, booktitle = {Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics}, pages = {513--520}, year = {1997}, editor = {Madigan, David and Smyth, Padhraic}, volume = {R1}, series = {Proceedings of Machine Learning Research}, month = {04--07 Jan}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/r1/suzuki97a/suzuki97a.pdf}, url = {https://proceedings.mlr.press/r1/suzuki97a.html}, abstract = {We consider model selection based on information criteria for classification. The information criterion is expressed in the form of the empirical entropy plus a compensation term $(k(g)/2)d(n)$, where $k(g)$ is the number of independent parameters in a model $g$, $d(n)$ is a function of $n$, and $n$ is the number of examples. First of all, we derive for arbitrary $d(\cdot)$ the asymptotically exact error probabilities in model selection. Although it was known for linear/autoregression processes that $d(n) = \log \log n$ is the minimum function of n such that the model selection satisfies strong consistency, the problem whether the same thing holds for classification has been open. We solve this problem affirmatively. Additionally, we derive for arbitrary $d(\cdot)$ the expected Kullback-leibler divergence between a true conditional probability and the conditional probability estimated by the model selection and the Laplace estimators. The derived value is $k(g^*)/(2n)$, where $g^*$ is a true model, and the accumulated value over $n$ time instances is $(k(g*)/2) \log n + 0(1)$, which implies the optimality of a predictive coding based on the model selection. Keywords: model selection, error probability, strong con- sistency, Kullback-Leibler divergence, minimum description length principle, Hannan/Quinn’s procedure, unseparated/separated models, Kolmogorov’s law of the iterated logarithm.}, note = {Reissued by PMLR on 30 March 2021.} }
Endnote
%0 Conference Paper %T On the Error Probability of Model Selection for Classification %A Joe Suzuki %B Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 1997 %E David Madigan %E Padhraic Smyth %F pmlr-vR1-suzuki97a %I PMLR %P 513--520 %U https://proceedings.mlr.press/r1/suzuki97a.html %V R1 %X We consider model selection based on information criteria for classification. The information criterion is expressed in the form of the empirical entropy plus a compensation term $(k(g)/2)d(n)$, where $k(g)$ is the number of independent parameters in a model $g$, $d(n)$ is a function of $n$, and $n$ is the number of examples. First of all, we derive for arbitrary $d(\cdot)$ the asymptotically exact error probabilities in model selection. Although it was known for linear/autoregression processes that $d(n) = \log \log n$ is the minimum function of n such that the model selection satisfies strong consistency, the problem whether the same thing holds for classification has been open. We solve this problem affirmatively. Additionally, we derive for arbitrary $d(\cdot)$ the expected Kullback-leibler divergence between a true conditional probability and the conditional probability estimated by the model selection and the Laplace estimators. The derived value is $k(g^*)/(2n)$, where $g^*$ is a true model, and the accumulated value over $n$ time instances is $(k(g*)/2) \log n + 0(1)$, which implies the optimality of a predictive coding based on the model selection. Keywords: model selection, error probability, strong con- sistency, Kullback-Leibler divergence, minimum description length principle, Hannan/Quinn’s procedure, unseparated/separated models, Kolmogorov’s law of the iterated logarithm. %Z Reissued by PMLR on 30 March 2021.
APA
Suzuki, J.. (1997). On the Error Probability of Model Selection for Classification. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research R1:513-520 Available from https://proceedings.mlr.press/r1/suzuki97a.html. Reissued by PMLR on 30 March 2021.

Related Material