ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information

[edit]

Mateusz Lango, Dariusz Brzezinski, Jerzy Stefanowski ;
Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 94:95-109, 2018.

Abstract

Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data.

Related Material