The Influence of Multiple Classes on Learning from Imbalanced Data Streams

Agnieszka Lipska, Jerzy Stefanowski
Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 183:187-198, 2022.

Abstract

This work is aimed at examining the influence of local data characteristics and drifts on the difficulties of learning online classifiers from multi-class imbalanced data streams. The results of many experiments with synthetically generated data streams have shown a much greater role of the overlapping between many minority classes (the type of borderline examples) than for streams with one minority class. The presence of rare examples in the stream is the most difficult single factor. Unlike binary streams, the specialized UOB and OOB classifiers perform well enough for even high imbalance ratios. The most challenging for all classifiers are complex scenarios integrating many drifts and factors simultaneously, which worsen the evaluation measures stronger than for binary ones.

Cite this Paper


BibTeX
@InProceedings{pmlr-v183-lipska22a, title = {The Influence of Multiple Classes on Learning from Imbalanced Data Streams}, author = {Lipska, Agnieszka and Stefanowski, Jerzy}, booktitle = {Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {187--198}, year = {2022}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luís and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {183}, series = {Proceedings of Machine Learning Research}, month = {23 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v183/lipska22a/lipska22a.pdf}, url = {https://proceedings.mlr.press/v183/lipska22a.html}, abstract = {This work is aimed at examining the influence of local data characteristics and drifts on the difficulties of learning online classifiers from multi-class imbalanced data streams. The results of many experiments with synthetically generated data streams have shown a much greater role of the overlapping between many minority classes (the type of borderline examples) than for streams with one minority class. The presence of rare examples in the stream is the most difficult single factor. Unlike binary streams, the specialized UOB and OOB classifiers perform well enough for even high imbalance ratios. The most challenging for all classifiers are complex scenarios integrating many drifts and factors simultaneously, which worsen the evaluation measures stronger than for binary ones.} }
Endnote
%0 Conference Paper %T The Influence of Multiple Classes on Learning from Imbalanced Data Streams %A Agnieszka Lipska %A Jerzy Stefanowski %B Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2022 %E Nuno Moniz %E Paula Branco %E Luís Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v183-lipska22a %I PMLR %P 187--198 %U https://proceedings.mlr.press/v183/lipska22a.html %V 183 %X This work is aimed at examining the influence of local data characteristics and drifts on the difficulties of learning online classifiers from multi-class imbalanced data streams. The results of many experiments with synthetically generated data streams have shown a much greater role of the overlapping between many minority classes (the type of borderline examples) than for streams with one minority class. The presence of rare examples in the stream is the most difficult single factor. Unlike binary streams, the specialized UOB and OOB classifiers perform well enough for even high imbalance ratios. The most challenging for all classifiers are complex scenarios integrating many drifts and factors simultaneously, which worsen the evaluation measures stronger than for binary ones.
APA
Lipska, A. & Stefanowski, J.. (2022). The Influence of Multiple Classes on Learning from Imbalanced Data Streams. Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 183:187-198 Available from https://proceedings.mlr.press/v183/lipska22a.html.

Related Material