A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database

Jianjun Xie, Viktoria Rojkova, Siddharth Pal, Stephen Coggeshall
Proceedings of KDD-Cup 2009 Competition, PMLR 7:35-43, 2009.

Abstract

We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v7-xie09, title = {A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database}, author = {Xie, Jianjun and Rojkova, Viktoria and Pal, Siddharth and Coggeshall, Stephen}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {35--43}, year = {2009}, editor = {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/xie09/xie09.pdf}, url = {https://proceedings.mlr.press/v7/xie09.html}, abstract = {We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.} }
Endnote
%0 Conference Paper %T A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database %A Jianjun Xie %A Viktoria Rojkova %A Siddharth Pal %A Stephen Coggeshall %B Proceedings of KDD-Cup 2009 Competition %C Proceedings of Machine Learning Research %D 2009 %E Gideon Dror %E Mar Boullé %E Isabelle Guyon %E Vincent Lemaire %E David Vogel %F pmlr-v7-xie09 %I PMLR %P 35--43 %U https://proceedings.mlr.press/v7/xie09.html %V 7 %X We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.
RIS
TY - CPAPER TI - A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database AU - Jianjun Xie AU - Viktoria Rojkova AU - Siddharth Pal AU - Stephen Coggeshall BT - Proceedings of KDD-Cup 2009 Competition DA - 2009/12/04 ED - Gideon Dror ED - Mar Boullé ED - Isabelle Guyon ED - Vincent Lemaire ED - David Vogel ID - pmlr-v7-xie09 PB - PMLR DP - Proceedings of Machine Learning Research VL - 7 SP - 35 EP - 43 L1 - http://proceedings.mlr.press/v7/xie09/xie09.pdf UR - https://proceedings.mlr.press/v7/xie09.html AB - We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques. ER -
APA
Xie, J., Rojkova, V., Pal, S. & Coggeshall, S.. (2009). A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:35-43 Available from https://proceedings.mlr.press/v7/xie09.html.

Related Material