Influence of minority class instance types on SMOTE imbalanced data oversampling

Przemysław Skryjomski, Bartosz Krawczyk
Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 74:7-21, 2017.

Abstract

Despite more than two decades of intense research, learning from imbalanced data still remains as one of the major difficulties posed for computational intelligence systems. Among plethora of techniques dedicated to alleviating this problem, preprocessing algorithms are considered among the most efficient ones. They aim at re-balancing the training set by either undersampling of the majority class, or oversampling of the minority one. Here, Synthetic Minority Oversampling Technique, commonly known as SMOTE, stands as the most popular solution that introduces artificial instances on the basis of minority class neighborhood distribution. However, many recent works point out to the fact that the imbalanced ratio itself is not the sole source of learning difficulties in such scenarios. One should take a deeper look into the minority class structure in order to identify which instances influence the performance of classifiers in most significant manner. In this paper, we propose to investigate the role of minority class instance types on the performance of SMOTE. To achieve this, instead of oversampling uniformly the minority class, we preprocess only selected subsets of instances, based on their individual difficulties. Experimental study proves that such a selective oversampling leads to improved classification performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v74-skryjomski17a, title = {Influence of minority class instance types on SMOTE imbalanced data oversampling}, author = {Skryjomski, Przemysław and Krawczyk, Bartosz}, booktitle = {Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {7--21}, year = {2017}, editor = {Luís Torgo, Paula Branco and Moniz, Nuno}, volume = {74}, series = {Proceedings of Machine Learning Research}, month = {22 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v74/skryjomski17a/skryjomski17a.pdf}, url = {https://proceedings.mlr.press/v74/skryjomski17a.html}, abstract = {Despite more than two decades of intense research, learning from imbalanced data still remains as one of the major difficulties posed for computational intelligence systems. Among plethora of techniques dedicated to alleviating this problem, preprocessing algorithms are considered among the most efficient ones. They aim at re-balancing the training set by either undersampling of the majority class, or oversampling of the minority one. Here, Synthetic Minority Oversampling Technique, commonly known as SMOTE, stands as the most popular solution that introduces artificial instances on the basis of minority class neighborhood distribution. However, many recent works point out to the fact that the imbalanced ratio itself is not the sole source of learning difficulties in such scenarios. One should take a deeper look into the minority class structure in order to identify which instances influence the performance of classifiers in most significant manner. In this paper, we propose to investigate the role of minority class instance types on the performance of SMOTE. To achieve this, instead of oversampling uniformly the minority class, we preprocess only selected subsets of instances, based on their individual difficulties. Experimental study proves that such a selective oversampling leads to improved classification performance.} }
Endnote
%0 Conference Paper %T Influence of minority class instance types on SMOTE imbalanced data oversampling %A Przemysław Skryjomski %A Bartosz Krawczyk %B Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2017 %E Paula Branco Luís Torgo %E Nuno Moniz %F pmlr-v74-skryjomski17a %I PMLR %P 7--21 %U https://proceedings.mlr.press/v74/skryjomski17a.html %V 74 %X Despite more than two decades of intense research, learning from imbalanced data still remains as one of the major difficulties posed for computational intelligence systems. Among plethora of techniques dedicated to alleviating this problem, preprocessing algorithms are considered among the most efficient ones. They aim at re-balancing the training set by either undersampling of the majority class, or oversampling of the minority one. Here, Synthetic Minority Oversampling Technique, commonly known as SMOTE, stands as the most popular solution that introduces artificial instances on the basis of minority class neighborhood distribution. However, many recent works point out to the fact that the imbalanced ratio itself is not the sole source of learning difficulties in such scenarios. One should take a deeper look into the minority class structure in order to identify which instances influence the performance of classifiers in most significant manner. In this paper, we propose to investigate the role of minority class instance types on the performance of SMOTE. To achieve this, instead of oversampling uniformly the minority class, we preprocess only selected subsets of instances, based on their individual difficulties. Experimental study proves that such a selective oversampling leads to improved classification performance.
APA
Skryjomski, P. & Krawczyk, B.. (2017). Influence of minority class instance types on SMOTE imbalanced data oversampling. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 74:7-21 Available from https://proceedings.mlr.press/v74/skryjomski17a.html.

Related Material