Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge

Patrick Doetsch, Christian Buck, Pavlo Golik, Niklas Hoppe, Michael Kramp, Johannes Laudenberg, Christian Oberdörfer, Pascal Steingrube, Jens Forster, Arne Mauser
Proceedings of KDD-Cup 2009 Competition, PMLR 7:77-88, 2009.

Abstract

In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.

Cite this Paper


BibTeX
@InProceedings{pmlr-v7-doetsch09, title = {Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge}, author = {Doetsch, Patrick and Buck, Christian and Golik, Pavlo and Hoppe, Niklas and Kramp, Michael and Laudenberg, Johannes and Oberdörfer, Christian and Steingrube, Pascal and Forster, Jens and Mauser, Arne}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {77--88}, year = {2009}, editor = {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/doetsch09/doetsch09.pdf}, url = {https://proceedings.mlr.press/v7/doetsch09.html}, abstract = {In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.} }
Endnote
%0 Conference Paper %T Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge %A Patrick Doetsch %A Christian Buck %A Pavlo Golik %A Niklas Hoppe %A Michael Kramp %A Johannes Laudenberg %A Christian Oberdörfer %A Pascal Steingrube %A Jens Forster %A Arne Mauser %B Proceedings of KDD-Cup 2009 Competition %C Proceedings of Machine Learning Research %D 2009 %E Gideon Dror %E Mar Boullé %E Isabelle Guyon %E Vincent Lemaire %E David Vogel %F pmlr-v7-doetsch09 %I PMLR %P 77--88 %U https://proceedings.mlr.press/v7/doetsch09.html %V 7 %X In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074.
RIS
TY - CPAPER TI - Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge AU - Patrick Doetsch AU - Christian Buck AU - Pavlo Golik AU - Niklas Hoppe AU - Michael Kramp AU - Johannes Laudenberg AU - Christian Oberdörfer AU - Pascal Steingrube AU - Jens Forster AU - Arne Mauser BT - Proceedings of KDD-Cup 2009 Competition DA - 2009/12/04 ED - Gideon Dror ED - Mar Boullé ED - Isabelle Guyon ED - Vincent Lemaire ED - David Vogel ID - pmlr-v7-doetsch09 PB - PMLR DP - Proceedings of Machine Learning Research VL - 7 SP - 77 EP - 88 L1 - http://proceedings.mlr.press/v7/doetsch09/doetsch09.pdf UR - https://proceedings.mlr.press/v7/doetsch09.html AB - In this work, we describe our approach to the “Small Challenge” of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number of missing values to select a suitable imputation method for each feature. Logistic Model Trees (LMT) are extended with a split criterion optimizing the Area under the ROC Curve (AUC), which was the requested evaluation criterion. By stacking boosted decision stumps and LMT we achieved the best result for the “Small Challenge” without making use of additional data from other feature sets, resulting in an AUC score of 0.8081. We also present results of an AUC optimizing model combination that scored only slightly worse with an AUC score of 0.8074. ER -
APA
Doetsch, P., Buck, C., Golik, P., Hoppe, N., Kramp, M., Laudenberg, J., Oberdörfer, C., Steingrube, P., Forster, J. & Mauser, A.. (2009). Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:77-88 Available from https://proceedings.mlr.press/v7/doetsch09.html.

Related Material