[edit]
Performance Estimation bias in Class Imbalance with Minority Subconcepts
Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 241:31-44, 2024.
Abstract
Learning classifiers from imbalanced data is known to be a challenging and important prob- lem in machine learning. As a results, the topic has been studied from a wide variety of angles. This includes the choice of evaluation measures and understanding the implica- tions of minority class subconcepts on model learning. In this work, however, we argue that the community may not be using precise enough evaluation measures when assessing the performance of imbalanced learning pipelines on data that includes an imbalance in the minority class subconcepts. We show that the performance estimates from standard measures used in imbalance learning are biased towards the largest minority subconcepts, and that standard imbalance correction techniques can exacerbate the bias. Finally, we demonstrate that the bias can, in part, be corrected by applying instance weighting in the evaluation measures.