SMOGN: a Pre-processing Approach for Imbalanced Regression

Paula Branco, Luís Torgo, Rita P. Ribeiro
Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 74:36-50, 2017.

Abstract

The problem of imbalanced domains, framed within predictive tasks, is relevant in many practical applications. When dealing with imbalanced domains a performance degradation is usually observed on the most rare and relevant cases for the user. This problem has been thoroughly studied within a classification setting where the target variable is nominal. The exploration of this problem in other contexts is more recent within the research community. For regression tasks, where the target variable is continuous, only a few solutions exist. Pre-processing strategies are among the most successful proposals for tackling this problem. In this paper we propose a new pre-processing approach for dealing with imbalanced regression. Our algorithm, SMOGN, incorporates two existing proposals trying to solve problems detected in both of them. We show that SMOGN has advantages in comparison to other approaches. We also show that our method has a different impact on the learners used, displaying more advantages for Random Forest and Multivariate Adaptive Regression Splines learners.

Cite this Paper


BibTeX
@InProceedings{pmlr-v74-branco17a, title = {{SMOGN}: a Pre-processing Approach for Imbalanced Regression}, author = {Branco, Paula and Torgo, Luís and Ribeiro, Rita P.}, booktitle = {Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {36--50}, year = {2017}, editor = {Luís Torgo, Paula Branco and Moniz, Nuno}, volume = {74}, series = {Proceedings of Machine Learning Research}, month = {22 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v74/branco17a/branco17a.pdf}, url = {https://proceedings.mlr.press/v74/branco17a.html}, abstract = {The problem of imbalanced domains, framed within predictive tasks, is relevant in many practical applications. When dealing with imbalanced domains a performance degradation is usually observed on the most rare and relevant cases for the user. This problem has been thoroughly studied within a classification setting where the target variable is nominal. The exploration of this problem in other contexts is more recent within the research community. For regression tasks, where the target variable is continuous, only a few solutions exist. Pre-processing strategies are among the most successful proposals for tackling this problem. In this paper we propose a new pre-processing approach for dealing with imbalanced regression. Our algorithm, SMOGN, incorporates two existing proposals trying to solve problems detected in both of them. We show that SMOGN has advantages in comparison to other approaches. We also show that our method has a different impact on the learners used, displaying more advantages for Random Forest and Multivariate Adaptive Regression Splines learners.} }
Endnote
%0 Conference Paper %T SMOGN: a Pre-processing Approach for Imbalanced Regression %A Paula Branco %A Luís Torgo %A Rita P. Ribeiro %B Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2017 %E Paula Branco Luís Torgo %E Nuno Moniz %F pmlr-v74-branco17a %I PMLR %P 36--50 %U https://proceedings.mlr.press/v74/branco17a.html %V 74 %X The problem of imbalanced domains, framed within predictive tasks, is relevant in many practical applications. When dealing with imbalanced domains a performance degradation is usually observed on the most rare and relevant cases for the user. This problem has been thoroughly studied within a classification setting where the target variable is nominal. The exploration of this problem in other contexts is more recent within the research community. For regression tasks, where the target variable is continuous, only a few solutions exist. Pre-processing strategies are among the most successful proposals for tackling this problem. In this paper we propose a new pre-processing approach for dealing with imbalanced regression. Our algorithm, SMOGN, incorporates two existing proposals trying to solve problems detected in both of them. We show that SMOGN has advantages in comparison to other approaches. We also show that our method has a different impact on the learners used, displaying more advantages for Random Forest and Multivariate Adaptive Regression Splines learners.
APA
Branco, P., Torgo, L. & Ribeiro, R.P.. (2017). SMOGN: a Pre-processing Approach for Imbalanced Regression. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 74:36-50 Available from https://proceedings.mlr.press/v74/branco17a.html.

Related Material