Improving Resampling-based Ensemble in Churn Prediction
Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 74:79-91, 2017.
Dealing with class imbalance is a challenging issue in churn prediction. Although resampling-based ensemble solutions have demonstrated their superiority in many fields, previous research shows that they cannot improve the profit-based measure in churn prediction. In this paper, we explore the impact of the class ratio in the training subsets on the predictive performance of resampling-based ensemble techniques based on experiments on real-world churn prediction data sets. The experimental results show that the setting of the class ratio has a great impact on the model performance. It is also found that by choosing suitable class ratios in the training subsets, UnderBagging and Balanced Random Forests can significantly improve profits brought by the churn prediction model. The demonstrated results provide some guidelines for both academic and industrial practitioners.