Probabilistic Metric to measure the imbalance in multi-class problems

Solander Patricio Lopes Agostinho, João Mendes-Moreira
Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 183:151-162, 2022.

Abstract

In machine learning, imbalanced data has been one of the most relevant issue that the classifiers have to deal with. The most common techniques applied in this scenario are all, somehow, based on oversampling or under sampling concepts, In the former, the number of instances of minority classes are, somehow, increased while in the latter, the number of instances in the majority classes are somehow reduced. By increasing Pre-processing, approaches as the ones described have been well succeeded in binary classification problems.However, as the larger the number of classes, less effective the pre-processing approaches are. Another related problem is that the metrics that evaluate the predictive performance of the classifiers can be not effective in the presence of imbalanced data. The metrics used to measure the predictive performance of classifiers, can be divided into three groups: threshold, ranking and Probabilistic metrics. This paper aimed to purpose a probabilistic metric with the main objective of, given the results of a classifier in a multi-class domain, verify the relation between these result and the imbalance problem. The main purpose of this work, is to build a probabilistic metric based on non-parametric approaches, to measure the effect of imbalance feature of dataset in multi-class problems. As part of the work, a comparison with the existing metrics will be implemented and analyzed, both to understand the relation between them and to choose the best of them according to each scenario

Cite this Paper


BibTeX
@InProceedings{pmlr-v183-agostinho22a, title = {Probabilistic Metric to measure the imbalance in multi-class problems}, author = {Agostinho, Solander Patricio Lopes and Mendes-Moreira, Jo\~{a}o}, booktitle = {Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {151--162}, year = {2022}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luís and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {183}, series = {Proceedings of Machine Learning Research}, month = {23 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v183/agostinho22a/agostinho22a.pdf}, url = {https://proceedings.mlr.press/v183/agostinho22a.html}, abstract = {In machine learning, imbalanced data has been one of the most relevant issue that the classifiers have to deal with. The most common techniques applied in this scenario are all, somehow, based on oversampling or under sampling concepts, In the former, the number of instances of minority classes are, somehow, increased while in the latter, the number of instances in the majority classes are somehow reduced. By increasing Pre-processing, approaches as the ones described have been well succeeded in binary classification problems.However, as the larger the number of classes, less effective the pre-processing approaches are. Another related problem is that the metrics that evaluate the predictive performance of the classifiers can be not effective in the presence of imbalanced data. The metrics used to measure the predictive performance of classifiers, can be divided into three groups: threshold, ranking and Probabilistic metrics. This paper aimed to purpose a probabilistic metric with the main objective of, given the results of a classifier in a multi-class domain, verify the relation between these result and the imbalance problem. The main purpose of this work, is to build a probabilistic metric based on non-parametric approaches, to measure the effect of imbalance feature of dataset in multi-class problems. As part of the work, a comparison with the existing metrics will be implemented and analyzed, both to understand the relation between them and to choose the best of them according to each scenario} }
Endnote
%0 Conference Paper %T Probabilistic Metric to measure the imbalance in multi-class problems %A Solander Patricio Lopes Agostinho %A João Mendes-Moreira %B Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2022 %E Nuno Moniz %E Paula Branco %E Luís Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v183-agostinho22a %I PMLR %P 151--162 %U https://proceedings.mlr.press/v183/agostinho22a.html %V 183 %X In machine learning, imbalanced data has been one of the most relevant issue that the classifiers have to deal with. The most common techniques applied in this scenario are all, somehow, based on oversampling or under sampling concepts, In the former, the number of instances of minority classes are, somehow, increased while in the latter, the number of instances in the majority classes are somehow reduced. By increasing Pre-processing, approaches as the ones described have been well succeeded in binary classification problems.However, as the larger the number of classes, less effective the pre-processing approaches are. Another related problem is that the metrics that evaluate the predictive performance of the classifiers can be not effective in the presence of imbalanced data. The metrics used to measure the predictive performance of classifiers, can be divided into three groups: threshold, ranking and Probabilistic metrics. This paper aimed to purpose a probabilistic metric with the main objective of, given the results of a classifier in a multi-class domain, verify the relation between these result and the imbalance problem. The main purpose of this work, is to build a probabilistic metric based on non-parametric approaches, to measure the effect of imbalance feature of dataset in multi-class problems. As part of the work, a comparison with the existing metrics will be implemented and analyzed, both to understand the relation between them and to choose the best of them according to each scenario
APA
Agostinho, S.P.L. & Mendes-Moreira, J.. (2022). Probabilistic Metric to measure the imbalance in multi-class problems. Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 183:151-162 Available from https://proceedings.mlr.press/v183/agostinho22a.html.

Related Material