A Bayesian nonparametric procedure for comparing algorithms


Alessio Benavoli, Giorgio Corani, Francesca Mangili, Marco Zaffalon ;
Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1264-1272, 2015.


A fundamental task in machine learning is to compare the performance of multiple algorithms. This is typically performed by frequentist tests (usually the Friedman test followed by a series of multiple pairwise comparisons). This implies dealing with null hypothesis significance tests and p-values, although the shortcomings of such methods are well known. First, we propose a nonparametric Bayesian version of the Friedman test using a Dirichlet process (DP) based prior. Our derivations show that, from a Bayesian perspective, the Friedman test is an inference for a multivariate mean based on an ellipsoid inclusion test. Second, we derive a joint procedure for the analysis of the multiple comparisons which accounts for their dependencies and which is based on the posterior probability computed through the DP. The proposed approach allows verifying the null hypothesis, not only rejecting it. Third, we apply our test to perform algorithms racing, i.e., the problem of identifying the best algorithm among a large set of candidates. We show by simulation that our approach is competitive both in terms of accuracy and speed in identifying the best algorithm.

Related Material