[edit]
On the Error Probability of Model Selection for Classification
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:513-520, 1997.
Abstract
We consider model selection based on information criteria for classification. The information criterion is expressed in the form of the empirical entropy plus a compensation term (k(g)/2)d(n), where k(g) is the number of independent parameters in a model g, d(n) is a function of n, and n is the number of examples. First of all, we derive for arbitrary d(⋅) the asymptotically exact error probabilities in model selection. Although it was known for linear/autoregression processes that d(n)=loglogn is the minimum function of n such that the model selection satisfies strong consistency, the problem whether the same thing holds for classification has been open. We solve this problem affirmatively. Additionally, we derive for arbitrary d(⋅) the expected Kullback-leibler divergence between a true conditional probability and the conditional probability estimated by the model selection and the Laplace estimators. The derived value is k(g∗)/(2n), where g∗ is a true model, and the accumulated value over n time instances is (k(g∗)/2)logn+0(1), which implies the optimality of a predictive coding based on the model selection. Keywords: model selection, error probability, strong con- sistency, Kullback-Leibler divergence, minimum description length principle, Hannan/Quinn’s procedure, unseparated/separated models, Kolmogorov’s law of the iterated logarithm.