ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data

Hongjiao Guan, Bin Ma, Yingtao Zhang, Xianglong Tang
Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:902-917, 2021.

Abstract

Many practical applications suffer from the problem of imbalanced classification. The minority class has poor classification performance; on the other hand, its misclassification cost is high. One reason for classification difficulty is the intrinsic complicated distribution characteristics (CDCs) in imbalanced data itself. Classical oversampling method SMOTE generates synthetic minority class examples between neighbors, which is parameter dependent. Furthermore, due to blindness of neighbor selection, SMOTE suffers from overgeneralization in the minority class. To solve such problems, we propose an oversampling method, called extended natural neighbors based SMOTE (ExNN-SMOTE). In ExNN-SMOTE, neighbors are determined adaptively by capturing data distribution characteristics. Extensive experiments over synthetic and real datasets demonstrate the effectiveness of ExNN-SMOTE dealing with CDCs and the superiority of ExNN-SMOTE over other SMOTE-related methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v157-guan21a, title = {ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data}, author = {Guan, Hongjiao and Ma, Bin and Zhang, Yingtao and Tang, Xianglong}, booktitle = {Proceedings of The 13th Asian Conference on Machine Learning}, pages = {902--917}, year = {2021}, editor = {Balasubramanian, Vineeth N. and Tsang, Ivor}, volume = {157}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v157/guan21a/guan21a.pdf}, url = {https://proceedings.mlr.press/v157/guan21a.html}, abstract = {Many practical applications suffer from the problem of imbalanced classification. The minority class has poor classification performance; on the other hand, its misclassification cost is high. One reason for classification difficulty is the intrinsic complicated distribution characteristics (CDCs) in imbalanced data itself. Classical oversampling method SMOTE generates synthetic minority class examples between neighbors, which is parameter dependent. Furthermore, due to blindness of neighbor selection, SMOTE suffers from overgeneralization in the minority class. To solve such problems, we propose an oversampling method, called extended natural neighbors based SMOTE (ExNN-SMOTE). In ExNN-SMOTE, neighbors are determined adaptively by capturing data distribution characteristics. Extensive experiments over synthetic and real datasets demonstrate the effectiveness of ExNN-SMOTE dealing with CDCs and the superiority of ExNN-SMOTE over other SMOTE-related methods.} }
Endnote
%0 Conference Paper %T ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data %A Hongjiao Guan %A Bin Ma %A Yingtao Zhang %A Xianglong Tang %B Proceedings of The 13th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Vineeth N. Balasubramanian %E Ivor Tsang %F pmlr-v157-guan21a %I PMLR %P 902--917 %U https://proceedings.mlr.press/v157/guan21a.html %V 157 %X Many practical applications suffer from the problem of imbalanced classification. The minority class has poor classification performance; on the other hand, its misclassification cost is high. One reason for classification difficulty is the intrinsic complicated distribution characteristics (CDCs) in imbalanced data itself. Classical oversampling method SMOTE generates synthetic minority class examples between neighbors, which is parameter dependent. Furthermore, due to blindness of neighbor selection, SMOTE suffers from overgeneralization in the minority class. To solve such problems, we propose an oversampling method, called extended natural neighbors based SMOTE (ExNN-SMOTE). In ExNN-SMOTE, neighbors are determined adaptively by capturing data distribution characteristics. Extensive experiments over synthetic and real datasets demonstrate the effectiveness of ExNN-SMOTE dealing with CDCs and the superiority of ExNN-SMOTE over other SMOTE-related methods.
APA
Guan, H., Ma, B., Zhang, Y. & Tang, X.. (2021). ExNN-SMOTE: Extended Natural Neighbors Based SMOTE to Deal with Imbalanced Data. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:902-917 Available from https://proceedings.mlr.press/v157/guan21a.html.

Related Material