A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database

Jianjun Xie, Viktoria Rojkova, Siddharth Pal, Stephen Coggeshall
; Proceedings of KDD-Cup 2009 Competition, PMLR 7:35-43, 2009.

Abstract

We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v7-xie09, title = {A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database}, author = {Jianjun Xie and Viktoria Rojkova and Siddharth Pal and Stephen Coggeshall}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {35--43}, year = {2009}, editor = {Gideon Dror and Mar Boullé and Isabelle Guyon and Vincent Lemaire and David Vogel}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/xie09/xie09.pdf}, url = {http://proceedings.mlr.press/v7/xie09.html}, abstract = {We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.} }
Endnote
%0 Conference Paper %T A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database %A Jianjun Xie %A Viktoria Rojkova %A Siddharth Pal %A Stephen Coggeshall %B Proceedings of KDD-Cup 2009 Competition %C Proceedings of Machine Learning Research %D 2009 %E Gideon Dror %E Mar Boullé %E Isabelle Guyon %E Vincent Lemaire %E David Vogel %F pmlr-v7-xie09 %I PMLR %J Proceedings of Machine Learning Research %P 35--43 %U http://proceedings.mlr.press %V 7 %W PMLR %X We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.
RIS
TY - CPAPER TI - A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database AU - Jianjun Xie AU - Viktoria Rojkova AU - Siddharth Pal AU - Stephen Coggeshall BT - Proceedings of KDD-Cup 2009 Competition PY - 2009/12/04 DA - 2009/12/04 ED - Gideon Dror ED - Mar Boullé ED - Isabelle Guyon ED - Vincent Lemaire ED - David Vogel ID - pmlr-v7-xie09 PB - PMLR SP - 35 DP - PMLR EP - 43 L1 - http://proceedings.mlr.press/v7/xie09/xie09.pdf UR - http://proceedings.mlr.press/v7/xie09.html AB - We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques. ER -
APA
Xie, J., Rojkova, V., Pal, S. & Coggeshall, S.. (2009). A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database. Proceedings of KDD-Cup 2009 Competition, in PMLR 7:35-43

Related Material