Bayesian Model Averaging in Rule Induction
Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, PMLR R1:157-164, 1997.
Bayesian model averaging (BMA) can be seen as the optimal approach to any induction task. It can reduce error by accounting for model uncertainty in a principled way, and its usefulness in several areas has been empirically verified. However, few attempts to apply it to rule induction have been made. This paper reports a series of experiments designed to test the utility of BMA in this field. BMA is applied to combining multiple rule sets learned from different subsets of the training data, to combining multiple rules covering a test example, to inducing technical rules for foreign exchange trading, and to inducing conjunctive concepts. In the first two cases, BMA is observed to produce lower accuracies than the ad hoc methods it is compared with. In the last two cases, BMA is observed to typically produce the same result as simply using the best (maximum-likelihood) rule, even though averaging is performed over all possible rules in the space, the domains are highly noisy, and the samples are medium- to small-sized. In all cases, this is observed to be due to BMA’s consistent tendency to assign highly asymmetric weights to different models, even when their accuracy differs by little, with most models (often all but one) effectively having no influence on the outcome. Thus the effective number of models being averaged is much smaller for BMA than for common ad hoc methods, leading to a smaller reduction in variance. This suggests that the success of the multiple models approach to rule induction is primarily due to this variance reduction, and not to its being a closer approximation to the Bayesian ideal.