An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes

Hung-Yi Lo; Kai-Wei Chang; Shang-Tse Chen; Tsung-Hsien Chiang; Chun- Sung Ferng; Cho-Jui Hsieh; Yi-Kuang Ko; Tsung-Ting Kuo; Hung-Che Lai; Ken-Yi Lin; Chia-Hsuan Wang; Hsiang-Fu Yu; Chih-Jen Lin; Hsuan-Tien Lin; Shou-de Lin

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes

Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun- Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, Shou-de Lin

Proceedings of KDD-Cup 2009 Competition, PMLR 7:57-64, 2009.

Abstract

This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.

Cite this Paper

BibTeX


@InProceedings{pmlr-v7-lo09,
  title = 	 {An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes},
  author = 	 {Lo, Hung-Yi and Chang, Kai-Wei and Chen, Shang-Tse and Chiang, Tsung-Hsien and Ferng, Chun- Sung and Hsieh, Cho-Jui and Ko, Yi-Kuang and Kuo, Tsung-Ting and Lai, Hung-Che and Lin, Ken-Yi and Wang, Chia-Hsuan and Yu, Hsiang-Fu and Lin, Chih-Jen and Lin, Hsuan-Tien and Lin, Shou-de},
  booktitle = 	 {Proceedings of KDD-Cup 2009 Competition},
  pages = 	 {57--64},
  year = 	 {2009},
  editor = 	 {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David},
  volume = 	 {7},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v7/lo09/lo09.pdf},
  url = 	 {https://proceedings.mlr.press/v7/lo09.html},
  abstract = 	 {This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.}
}

Endnote

%0 Conference Paper
%T An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes
%A Hung-Yi Lo
%A Kai-Wei Chang
%A Shang-Tse Chen
%A Tsung-Hsien Chiang
%A Chun- Sung Ferng
%A Cho-Jui Hsieh
%A Yi-Kuang Ko
%A Tsung-Ting Kuo
%A Hung-Che Lai
%A Ken-Yi Lin
%A Chia-Hsuan Wang
%A Hsiang-Fu Yu
%A Chih-Jen Lin
%A Hsuan-Tien Lin
%A Shou-de Lin
%B Proceedings of KDD-Cup 2009 Competition
%C Proceedings of Machine Learning Research
%D 2009
%E Gideon Dror
%E Mar Boullé
%E Isabelle Guyon
%E Vincent Lemaire
%E David Vogel	
%F pmlr-v7-lo09
%I PMLR
%P 57--64
%U https://proceedings.mlr.press/v7/lo09.html
%V 7
%X This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.

RIS


TY  - CPAPER
TI  - An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes
AU  - Hung-Yi Lo
AU  - Kai-Wei Chang
AU  - Shang-Tse Chen
AU  - Tsung-Hsien Chiang
AU  - Chun- Sung Ferng
AU  - Cho-Jui Hsieh
AU  - Yi-Kuang Ko
AU  - Tsung-Ting Kuo
AU  - Hung-Che Lai
AU  - Ken-Yi Lin
AU  - Chia-Hsuan Wang
AU  - Hsiang-Fu Yu
AU  - Chih-Jen Lin
AU  - Hsuan-Tien Lin
AU  - Shou-de Lin
BT  - Proceedings of KDD-Cup 2009 Competition
DA  - 2009/12/04
ED  - Gideon Dror
ED  - Mar Boullé
ED  - Isabelle Guyon
ED  - Vincent Lemaire
ED  - David Vogel	
ID  - pmlr-v7-lo09
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 7
SP  - 57
EP  - 64
L1  - http://proceedings.mlr.press/v7/lo09/lo09.pdf
UR  - https://proceedings.mlr.press/v7/lo09.html
AB  - This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naïve Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using crossvalidation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.
ER  -

APA


Lo, H., Chang, K., Chen, S., Chiang, T., Ferng, C.S., Hsieh, C., Ko, Y., Kuo, T., Lai, H., Lin, K., Wang, C., Yu, H., Lin, C., Lin, H. & Lin, S.. (2009). An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:57-64 Available from https://proceedings.mlr.press/v7/lo09.html.

Related Material

Download PDF