The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Adrian Stando, Mustafa Cavus, Przemyslaw Biecek
Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 241:16-30, 2024.

Abstract

Imbalanced data poses a significant challenge in classification as model performance is af- fected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intel- ligence tools are used to compare models trained on datasets before and after balancing. In addition to the Variable Importance method, this study uses Partial Dependence and Accumulated Local Effects profiles. Real and simulated datasets are tested, and an open- source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v241-stando24a, title = {The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems}, author = {Stando, Adrian and Cavus, Mustafa and Biecek, Przemyslaw}, booktitle = {Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {16--30}, year = {2024}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luis and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {241}, series = {Proceedings of Machine Learning Research}, month = {18 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v241/stando24a/stando24a.pdf}, url = {https://proceedings.mlr.press/v241/stando24a.html}, abstract = {Imbalanced data poses a significant challenge in classification as model performance is af- fected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intel- ligence tools are used to compare models trained on datasets before and after balancing. In addition to the Variable Importance method, this study uses Partial Dependence and Accumulated Local Effects profiles. Real and simulated datasets are tested, and an open- source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.} }
Endnote
%0 Conference Paper %T The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems %A Adrian Stando %A Mustafa Cavus %A Przemyslaw Biecek %B Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2024 %E Nuno Moniz %E Paula Branco %E Luis Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v241-stando24a %I PMLR %P 16--30 %U https://proceedings.mlr.press/v241/stando24a.html %V 241 %X Imbalanced data poses a significant challenge in classification as model performance is af- fected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intel- ligence tools are used to compare models trained on datasets before and after balancing. In addition to the Variable Importance method, this study uses Partial Dependence and Accumulated Local Effects profiles. Real and simulated datasets are tested, and an open- source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.
APA
Stando, A., Cavus, M. & Biecek, P.. (2024). The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems. Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 241:16-30 Available from https://proceedings.mlr.press/v241/stando24a.html.

Related Material