Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets

Alexandre Lacoste, Francois Laviolette, Mario Marchand
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:665-675, 2012.

Abstract

We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel non-parametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-lacoste12, title = {Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets}, author = {Lacoste, Alexandre and Laviolette, Francois and Marchand, Mario}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {665--675}, year = {2012}, editor = {Lawrence, Neil D. and Girolami, Mark}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/lacoste12/lacoste12.pdf}, url = {https://proceedings.mlr.press/v22/lacoste12.html}, abstract = {We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel non-parametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons.} }
Endnote
%0 Conference Paper %T Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets %A Alexandre Lacoste %A Francois Laviolette %A Mario Marchand %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-lacoste12 %I PMLR %P 665--675 %U https://proceedings.mlr.press/v22/lacoste12.html %V 22 %X We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel non-parametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons.
RIS
TY - CPAPER TI - Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets AU - Alexandre Lacoste AU - Francois Laviolette AU - Mario Marchand BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-lacoste12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 22 SP - 665 EP - 675 L1 - http://proceedings.mlr.press/v22/lacoste12/lacoste12.pdf UR - https://proceedings.mlr.press/v22/lacoste12.html AB - We propose a new method for comparing learning algorithms on multiple tasks which is based on a novel non-parametric test that we call the Poisson binomial test. The key aspect of this work is that we provide a formal definition for what is meant to have an algorithm that is better than another. Also, we are able to take into account the dependencies induced when evaluating classifiers on the same test set. Finally we make optimal use (in the Bayesian sense) of all the testing data we have. We demonstrate empirically that our approach is more reliable than the sign test and the Wilcoxon signed rank test, the current state of the art for algorithm comparisons. ER -
APA
Lacoste, A., Laviolette, F. & Marchand, M.. (2012). Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:665-675 Available from https://proceedings.mlr.press/v22/lacoste12.html.

Related Material