A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database

Jianjun Xie; Viktoria Rojkova; Siddharth Pal; Stephen Coggeshall

A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database

Jianjun Xie, Viktoria Rojkova, Siddharth Pal, Stephen Coggeshall

Proceedings of KDD-Cup 2009 Competition, PMLR 7:35-43, 2009.

Abstract

We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.

Cite this Paper

BibTeX


@InProceedings{pmlr-v7-xie09,
  title = 	 {A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database},
  author = 	 {Xie, Jianjun and Rojkova, Viktoria and Pal, Siddharth and Coggeshall, Stephen},
  booktitle = 	 {Proceedings of KDD-Cup 2009 Competition},
  pages = 	 {35--43},
  year = 	 {2009},
  editor = 	 {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David},
  volume = 	 {7},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v7/xie09/xie09.pdf},
  url = 	 {https://proceedings.mlr.press/v7/xie09.html},
  abstract = 	 {We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.}
}

Endnote

%0 Conference Paper
%T A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database
%A Jianjun Xie
%A Viktoria Rojkova
%A Siddharth Pal
%A Stephen Coggeshall
%B Proceedings of KDD-Cup 2009 Competition
%C Proceedings of Machine Learning Research
%D 2009
%E Gideon Dror
%E Mar Boullé
%E Isabelle Guyon
%E Vincent Lemaire
%E David Vogel	
%F pmlr-v7-xie09
%I PMLR
%P 35--43
%U https://proceedings.mlr.press/v7/xie09.html
%V 7
%X We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.

RIS


TY  - CPAPER
TI  - A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database
AU  - Jianjun Xie
AU  - Viktoria Rojkova
AU  - Siddharth Pal
AU  - Stephen Coggeshall
BT  - Proceedings of KDD-Cup 2009 Competition
DA  - 2009/12/04
ED  - Gideon Dror
ED  - Mar Boullé
ED  - Isabelle Guyon
ED  - Vincent Lemaire
ED  - David Vogel	
ID  - pmlr-v7-xie09
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 7
SP  - 35
EP  - 43
L1  - http://proceedings.mlr.press/v7/xie09/xie09.pdf
UR  - https://proceedings.mlr.press/v7/xie09.html
AB  - We present the ideas and methodologies that we used to address the KDD Cup 2009 challenge on rank-ordering the probability of churn, appetency and up-selling of wireless customers. We choose stochastic gradient boosting tree (TreeNet ®) as our main classifier to handle this large unbalanced dataset. In order to further improve the robustness and accuracy of our results, we bag a series of boosted tree models together as our final submission. Through our exploration we conclude that the most critical factors to achieve our results are effective variable preprocessing and selection, proper imbalanced data handling as well as the combination of bagging and boosting techniques.
ER  -

APA


Xie, J., Rojkova, V., Pal, S. & Coggeshall, S.. (2009). A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:35-43 Available from https://proceedings.mlr.press/v7/xie09.html.

Related Material

Download PDF