Never mind the metrics---what about the uncertainty? Visualising binary confusion matrix metric distributions to put performance in perspective

David Lovell, Dimity Miller, Jaiden Capra, Andrew P. Bradley
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:22702-22757, 2023.

Abstract

There are strong incentives to build classification systems that show outstanding performance on various datasets and benchmarks. This can encourage a narrow focus on models and the performance metrics used to evaluate and compare them—resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on binary classifier performance metrics by showing how uncertainty in these metrics can easily eclipse differences in empirical performance. We emphasise the discrete nature of confusion matrices and show how they can be well represented in a 3D lattice whose cross-sections form the space of receiver operating characteristic (ROC) curves. We develop novel interactive visualisations of performance metric contours within (and beyond) ROC space, showing the discrete probability mass functions of true and false positive rates and how these relate to performance metric distributions. We aim to raise awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that performance claims should be tempered by this understanding.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lovell23a, title = {Never mind the metrics---what about the uncertainty? {V}isualising binary confusion matrix metric distributions to put performance in perspective}, author = {Lovell, David and Miller, Dimity and Capra, Jaiden and Bradley, Andrew P.}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {22702--22757}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lovell23a/lovell23a.pdf}, url = {https://proceedings.mlr.press/v202/lovell23a.html}, abstract = {There are strong incentives to build classification systems that show outstanding performance on various datasets and benchmarks. This can encourage a narrow focus on models and the performance metrics used to evaluate and compare them—resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on binary classifier performance metrics by showing how uncertainty in these metrics can easily eclipse differences in empirical performance. We emphasise the discrete nature of confusion matrices and show how they can be well represented in a 3D lattice whose cross-sections form the space of receiver operating characteristic (ROC) curves. We develop novel interactive visualisations of performance metric contours within (and beyond) ROC space, showing the discrete probability mass functions of true and false positive rates and how these relate to performance metric distributions. We aim to raise awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that performance claims should be tempered by this understanding.} }
Endnote
%0 Conference Paper %T Never mind the metrics---what about the uncertainty? Visualising binary confusion matrix metric distributions to put performance in perspective %A David Lovell %A Dimity Miller %A Jaiden Capra %A Andrew P. Bradley %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lovell23a %I PMLR %P 22702--22757 %U https://proceedings.mlr.press/v202/lovell23a.html %V 202 %X There are strong incentives to build classification systems that show outstanding performance on various datasets and benchmarks. This can encourage a narrow focus on models and the performance metrics used to evaluate and compare them—resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on binary classifier performance metrics by showing how uncertainty in these metrics can easily eclipse differences in empirical performance. We emphasise the discrete nature of confusion matrices and show how they can be well represented in a 3D lattice whose cross-sections form the space of receiver operating characteristic (ROC) curves. We develop novel interactive visualisations of performance metric contours within (and beyond) ROC space, showing the discrete probability mass functions of true and false positive rates and how these relate to performance metric distributions. We aim to raise awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that performance claims should be tempered by this understanding.
APA
Lovell, D., Miller, D., Capra, J. & Bradley, A.P.. (2023). Never mind the metrics---what about the uncertainty? Visualising binary confusion matrix metric distributions to put performance in perspective. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:22702-22757 Available from https://proceedings.mlr.press/v202/lovell23a.html.

Related Material