$F_{\beta}$-plot - a visual tool for evaluating imbalanced data classifiers

Szymon Wojciechowski; Michal Wozniak

$F_{\beta}$-plot - a visual tool for evaluating imbalanced data classifiers

Szymon Wojciechowski, Michal Wozniak

Proceedings of The Workshop on Classifier Learning from Difficult Data, PMLR 263:25-31, 2024.

Abstract

Imbalanced data classification suffers from a lack of reliable metrics. This runs primarily from the fact that for most real-life (and commonly used benchmark) problems, we do not have information from the user on the actual form of the loss function that should be minimized. Although it is pretty common to have metrics indicating the classification quality within each class, for the end user, the analysis of several such metrics is then required, which in practice causes difficulty in interpreting the usefulness of a given classifier. Hence, many aggregate metrics have been proposed or adopted for the imbalanced data classification problem, but there is still no consensus on which should be used. An additional disadvantage is their ambiguity and systematic bias toward one class. Moreover, their use in analyzing experimental results in recognition of those classification models that perform well for the chosen aggregated metrics is burdened with the abovementioned drawbacks. Hence, the paper proposes a simple approach to analyzing the popular parametric metric Fᵦ. We point out that it is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.

Cite this Paper

BibTeX

@InProceedings{pmlr-v263-wojciechowski24a,
  title = 	 {$F_{\beta}$-plot - a visual tool for evaluating imbalanced data classifiers},
  author =       {Wojciechowski, Szymon and Wozniak, Michal},
  booktitle = 	 {Proceedings of The Workshop on Classifier Learning from Difficult Data},
  pages = 	 {25--31},
  year = 	 {2024},
  editor = 	 {Zyblewski, Pawel and Grana, Manuel and Pawel, Ksieniewicz and Minku, Leandro},
  volume = 	 {263},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {19--20 Oct},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v263/main/assets/wojciechowski24a/wojciechowski24a.pdf},
  url = 	 {https://proceedings.mlr.press/v263/wojciechowski24a.html},
  abstract = 	 {Imbalanced data classification suffers from a lack of reliable metrics. This runs primarily from the fact that for most real-life (and commonly used benchmark) problems, we do not have information from the user on the actual form of the loss function that should be minimized. Although it is pretty common to have metrics indicating the classification quality within each class, for the end user, the analysis of several such metrics is then required, which in practice causes difficulty in interpreting the usefulness of a given classifier. Hence, many aggregate metrics have been proposed or adopted for the imbalanced data classification problem, but there is still no consensus on which should be used. An additional disadvantage is their ambiguity and systematic bias toward one class. Moreover, their use in analyzing experimental results in recognition of those classification models that perform well for the chosen aggregated metrics is burdened with the abovementioned drawbacks. Hence, the paper proposes a simple approach to analyzing the popular parametric metric Fᵦ. We point out that it is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.}
}

Endnote

%0 Conference Paper
%T $F_{\beta}$-plot - a visual tool for evaluating imbalanced data classifiers
%A Szymon Wojciechowski
%A Michal Wozniak
%B Proceedings of The Workshop on Classifier Learning from Difficult Data
%C Proceedings of Machine Learning Research
%D 2024
%E Pawel Zyblewski
%E Manuel Grana
%E Ksieniewicz Pawel
%E Leandro Minku	
%F pmlr-v263-wojciechowski24a
%I PMLR
%P 25--31
%U https://proceedings.mlr.press/v263/wojciechowski24a.html
%V 263
%X Imbalanced data classification suffers from a lack of reliable metrics. This runs primarily from the fact that for most real-life (and commonly used benchmark) problems, we do not have information from the user on the actual form of the loss function that should be minimized. Although it is pretty common to have metrics indicating the classification quality within each class, for the end user, the analysis of several such metrics is then required, which in practice causes difficulty in interpreting the usefulness of a given classifier. Hence, many aggregate metrics have been proposed or adopted for the imbalanced data classification problem, but there is still no consensus on which should be used. An additional disadvantage is their ambiguity and systematic bias toward one class. Moreover, their use in analyzing experimental results in recognition of those classification models that perform well for the chosen aggregated metrics is burdened with the abovementioned drawbacks. Hence, the paper proposes a simple approach to analyzing the popular parametric metric Fᵦ. We point out that it is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.

APA

Wojciechowski, S. & Wozniak, M.. (2024). $F_{\beta}$-plot - a visual tool for evaluating imbalanced data classifiers. Proceedings of The Workshop on Classifier Learning from Difficult Data, in Proceedings of Machine Learning Research 263:25-31 Available from https://proceedings.mlr.press/v263/wojciechowski24a.html.

Related Material

Download PDF