Performance Estimation bias in Class Imbalance with Minority Subconcepts

Colin Bellinger, Roberto Corizzo, Nathalie Japkowicz
Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 241:31-44, 2024.

Abstract

Learning classifiers from imbalanced data is known to be a challenging and important prob- lem in machine learning. As a results, the topic has been studied from a wide variety of angles. This includes the choice of evaluation measures and understanding the implica- tions of minority class subconcepts on model learning. In this work, however, we argue that the community may not be using precise enough evaluation measures when assessing the performance of imbalanced learning pipelines on data that includes an imbalance in the minority class subconcepts. We show that the performance estimates from standard measures used in imbalance learning are biased towards the largest minority subconcepts, and that standard imbalance correction techniques can exacerbate the bias. Finally, we demonstrate that the bias can, in part, be corrected by applying instance weighting in the evaluation measures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v241-bellinger24a, title = {Performance Estimation bias in Class Imbalance with Minority Subconcepts}, author = {Bellinger, Colin and Corizzo, Roberto and Japkowicz, Nathalie}, booktitle = {Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {31--44}, year = {2024}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luis and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {241}, series = {Proceedings of Machine Learning Research}, month = {18 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v241/bellinger24a/bellinger24a.pdf}, url = {https://proceedings.mlr.press/v241/bellinger24a.html}, abstract = {Learning classifiers from imbalanced data is known to be a challenging and important prob- lem in machine learning. As a results, the topic has been studied from a wide variety of angles. This includes the choice of evaluation measures and understanding the implica- tions of minority class subconcepts on model learning. In this work, however, we argue that the community may not be using precise enough evaluation measures when assessing the performance of imbalanced learning pipelines on data that includes an imbalance in the minority class subconcepts. We show that the performance estimates from standard measures used in imbalance learning are biased towards the largest minority subconcepts, and that standard imbalance correction techniques can exacerbate the bias. Finally, we demonstrate that the bias can, in part, be corrected by applying instance weighting in the evaluation measures.} }
Endnote
%0 Conference Paper %T Performance Estimation bias in Class Imbalance with Minority Subconcepts %A Colin Bellinger %A Roberto Corizzo %A Nathalie Japkowicz %B Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2024 %E Nuno Moniz %E Paula Branco %E Luis Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v241-bellinger24a %I PMLR %P 31--44 %U https://proceedings.mlr.press/v241/bellinger24a.html %V 241 %X Learning classifiers from imbalanced data is known to be a challenging and important prob- lem in machine learning. As a results, the topic has been studied from a wide variety of angles. This includes the choice of evaluation measures and understanding the implica- tions of minority class subconcepts on model learning. In this work, however, we argue that the community may not be using precise enough evaluation measures when assessing the performance of imbalanced learning pipelines on data that includes an imbalance in the minority class subconcepts. We show that the performance estimates from standard measures used in imbalance learning are biased towards the largest minority subconcepts, and that standard imbalance correction techniques can exacerbate the bias. Finally, we demonstrate that the bias can, in part, be corrected by applying instance weighting in the evaluation measures.
APA
Bellinger, C., Corizzo, R. & Japkowicz, N.. (2024). Performance Estimation bias in Class Imbalance with Minority Subconcepts. Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 241:31-44 Available from https://proceedings.mlr.press/v241/bellinger24a.html.

Related Material