A Diversity-aware Model for Majority Vote Ensemble Accuracy

Bob Durrant, Nick Lim
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4078-4087, 2020.

Abstract

Ensemble classifiers are a successful and popular approach for classification, and are frequently found to have better generalization performance than single models in practice. Although it is widely recognized that ‘diversity’ between ensemble members is important in achieving these performance gains, for classification ensembles it is not widely understood which diversity measures are most predictive of ensemble performance, nor how large an ensemble should be for a particular application. In this paper, we explore the predictive power of several common diversity measures and show – with extensive experiments – that contrary to earlier work that finds no clear link between these diversity measures (in isolation) and ensemble accuracy instead by using the $\rho$ diversity measure of Sneath and Sokal as an estimator for the dispersion parameter of a Polya-Eggenberger distribution we can predict, independently of the choice of base classifier family, the accuracy of a majority vote classifier ensemble ridiculously well. We discuss our model and some implications of our findings – such as diversity-aware (non-greedy) pruning of a majority-voting ensemble.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-durrant20a, title = {A Diversity-aware Model for Majority Vote Ensemble Accuracy}, author = {Durrant, Bob and Lim, Nick}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {4078--4087}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/durrant20a/durrant20a.pdf}, url = {https://proceedings.mlr.press/v108/durrant20a.html}, abstract = {Ensemble classifiers are a successful and popular approach for classification, and are frequently found to have better generalization performance than single models in practice. Although it is widely recognized that ‘diversity’ between ensemble members is important in achieving these performance gains, for classification ensembles it is not widely understood which diversity measures are most predictive of ensemble performance, nor how large an ensemble should be for a particular application. In this paper, we explore the predictive power of several common diversity measures and show – with extensive experiments – that contrary to earlier work that finds no clear link between these diversity measures (in isolation) and ensemble accuracy instead by using the $\rho$ diversity measure of Sneath and Sokal as an estimator for the dispersion parameter of a Polya-Eggenberger distribution we can predict, independently of the choice of base classifier family, the accuracy of a majority vote classifier ensemble ridiculously well. We discuss our model and some implications of our findings – such as diversity-aware (non-greedy) pruning of a majority-voting ensemble.} }
Endnote
%0 Conference Paper %T A Diversity-aware Model for Majority Vote Ensemble Accuracy %A Bob Durrant %A Nick Lim %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-durrant20a %I PMLR %P 4078--4087 %U https://proceedings.mlr.press/v108/durrant20a.html %V 108 %X Ensemble classifiers are a successful and popular approach for classification, and are frequently found to have better generalization performance than single models in practice. Although it is widely recognized that ‘diversity’ between ensemble members is important in achieving these performance gains, for classification ensembles it is not widely understood which diversity measures are most predictive of ensemble performance, nor how large an ensemble should be for a particular application. In this paper, we explore the predictive power of several common diversity measures and show – with extensive experiments – that contrary to earlier work that finds no clear link between these diversity measures (in isolation) and ensemble accuracy instead by using the $\rho$ diversity measure of Sneath and Sokal as an estimator for the dispersion parameter of a Polya-Eggenberger distribution we can predict, independently of the choice of base classifier family, the accuracy of a majority vote classifier ensemble ridiculously well. We discuss our model and some implications of our findings – such as diversity-aware (non-greedy) pruning of a majority-voting ensemble.
APA
Durrant, B. & Lim, N.. (2020). A Diversity-aware Model for Majority Vote Ensemble Accuracy. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4078-4087 Available from https://proceedings.mlr.press/v108/durrant20a.html.

Related Material