ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information

Mateusz Lango, Dariusz Brzezinski, Jerzy Stefanowski
Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 94:95-109, 2018.

Abstract

Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v94-lango18a, title = {ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information}, author = {Lango, Mateusz and Brzezinski, Dariusz and Stefanowski, Jerzy}, booktitle = {Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {95--109}, year = {2018}, editor = {Torgo, Luís and Matwin, Stan and Japkowicz, Nathalie and Krawczyk, Bartosz and Moniz, Nuno and Branco, Paula}, volume = {94}, series = {Proceedings of Machine Learning Research}, month = {10 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v94/lango18a/lango18a.pdf}, url = {https://proceedings.mlr.press/v94/lango18a.html}, abstract = {Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data.} }
Endnote
%0 Conference Paper %T ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information %A Mateusz Lango %A Dariusz Brzezinski %A Jerzy Stefanowski %B Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2018 %E Luís Torgo %E Stan Matwin %E Nathalie Japkowicz %E Bartosz Krawczyk %E Nuno Moniz %E Paula Branco %F pmlr-v94-lango18a %I PMLR %P 95--109 %U https://proceedings.mlr.press/v94/lango18a.html %V 94 %X Preprocessing methods for imbalanced data transform the training data to a form more suitable for learning classifiers. Most of these methods either focus on local relationships between single training examples or analyze the global characteristics of the data, such as the class imbalance ratio in the dataset. However, they do not sufficiently exploit the combination of both these views. In this paper, we put forward a new data preprocessing method called ImWeights, which weights training examples according to their local difficulty (safety) and the vicinity of larger minority clusters (gravity). Experiments with real-world datasets show that ImWeights is on par with local and global preprocessing methods, while being the least memory intensive. The introduced notion of minority cluster gravity opens new lines of research for specialized preprocessing methods and classifier modifications for imbalanced data.
APA
Lango, M., Brzezinski, D. & Stefanowski, J.. (2018). ImWeights: Classifying Imbalanced Data Using Local and Neighborhood Information. Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 94:95-109 Available from https://proceedings.mlr.press/v94/lango18a.html.

Related Material