Model choice: A minimum posterior predictive loss approach
Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, PMLR R2, 1999.
Model choice is a fundamental activity in the analysis of data sets, an activity which has become increasingly more important as computational advances enable the fitting of increasingly complex models. Such complexity typically arises through hierarchical structure which requires specification at each stage of probabilistic mechanisms, mean and dispersion forms, explanatory variables, etc. Nonnested hierarchical models introducing random effects may not be handled by classical methods. Bayesian approaches using predictive distributions can be used though the FORMAL solution, which includes Bayes factors as a special case, can be criticized. It seems natural to evaluate model performance by comparing what it predicts with what has been observed. Most classical criteria utilize such comparison. We propose a predictive criterion where the goal is good prediction of a replicate of the observed data but tempered by fidelity to the observed values. We obtain this criterion by minimizing posterior loss for a given model and then, for models under consideration, selecting the one which minimizes this criterion. For a version of log scoring loss we can do the minimization explicitly, obtaining an expression which can be interpreted as a penalized deviance criterion. We illustrate its performance with an application to a large data set involving residential property transactions.