[edit]
Comparing Predictive Inference Methods for Discrete Domains
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:311-318, 1997.
Abstract
Predictive inference is seen here as the process of determining the predictive distribution of a discrete variable, given a data set of training examples and the values for the other problem domain variables. We consider three approaches for computing this predictive distribution, and assume that the joint probability distribution for the variables belongs to a set of distributions determined by a set of parametric models. In the simplest case, the predictive distribution is computed by using the model with the \emph{maximum a posteriori (MAP)} posterior probability. In the \emph{evidence} approach, the predictive distribution is obtained by averaging over all the individual models.in the model family. In the third case, we define the predictive distribution by using Rissanen’s new definition of \emph{stochastic complexity}. Our experiments performed with the family of Naive Bayes models suggest that when using all the data available, the stochastic complexity approach produces the most accurate predictions in the log-score sense. However, when the amount of available training data is decreased, the evidence approach clearly outperforms the two other approaches. The MAP predictive distribution is clearly inferior in the log-score sense to the two more sophisticated approaches, but for the 0/1-score the MAP approach may still in some cases produce the best results.