Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database

Isabelle Guyon, Vincent Lemaire, Marc Boullé, Gideon Dror, David Vogel
Proceedings of KDD-Cup 2009 Competition, PMLR 7:1-22, 2009.

Abstract

We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The challenge started on March 10, 2009 and ended on May 11, 2009. This challenge attracted over 450 participants from 46 countries. We attribute the popularity of the challenge to several factors: (1) A generic problem relevant to the Industry (a classification problem), but presenting a number of scientific and technical challenges of real practical interest (many missing values, large number of features - 15000 - and large number of training examples - 50000, unbalanced class proportions - fewer than 10% of the examples of the positive class), noisy data, many missing values, presence of categorical variables with many different values. (2) Prizes - Orange offers 10000 Euros in prizes. (3) A well designed protocol and web site - we benefitted from past experience. (4) An effective advertising campaign using mailings and a teleconference to answer potential participants questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very effective and that ensemble of decision trees offer off-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at \texttthttp://www.kddcup-orange.com/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v7-guyon09, title = {Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database}, author = {Guyon, Isabelle and Lemaire, Vincent and Boullé, Marc and Dror, Gideon and Vogel, David}, booktitle = {Proceedings of KDD-Cup 2009 Competition}, pages = {1--22}, year = {2009}, editor = {Dror, Gideon and Boullé, Mar and Guyon, Isabelle and Lemaire, Vincent and Vogel, David}, volume = {7}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {28 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v7/guyon09/guyon09.pdf}, url = {https://proceedings.mlr.press/v7/guyon09.html}, abstract = {We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The challenge started on March 10, 2009 and ended on May 11, 2009. This challenge attracted over 450 participants from 46 countries. We attribute the popularity of the challenge to several factors: (1) A generic problem relevant to the Industry (a classification problem), but presenting a number of scientific and technical challenges of real practical interest (many missing values, large number of features - 15000 - and large number of training examples - 50000, unbalanced class proportions - fewer than 10% of the examples of the positive class), noisy data, many missing values, presence of categorical variables with many different values. (2) Prizes - Orange offers 10000 Euros in prizes. (3) A well designed protocol and web site - we benefitted from past experience. (4) An effective advertising campaign using mailings and a teleconference to answer potential participants questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very effective and that ensemble of decision trees offer off-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at \texttthttp://www.kddcup-orange.com/.} }
Endnote
%0 Conference Paper %T Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database %A Isabelle Guyon %A Vincent Lemaire %A Marc Boullé %A Gideon Dror %A David Vogel %B Proceedings of KDD-Cup 2009 Competition %C Proceedings of Machine Learning Research %D 2009 %E Gideon Dror %E Mar Boullé %E Isabelle Guyon %E Vincent Lemaire %E David Vogel %F pmlr-v7-guyon09 %I PMLR %P 1--22 %U https://proceedings.mlr.press/v7/guyon09.html %V 7 %X We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The challenge started on March 10, 2009 and ended on May 11, 2009. This challenge attracted over 450 participants from 46 countries. We attribute the popularity of the challenge to several factors: (1) A generic problem relevant to the Industry (a classification problem), but presenting a number of scientific and technical challenges of real practical interest (many missing values, large number of features - 15000 - and large number of training examples - 50000, unbalanced class proportions - fewer than 10% of the examples of the positive class), noisy data, many missing values, presence of categorical variables with many different values. (2) Prizes - Orange offers 10000 Euros in prizes. (3) A well designed protocol and web site - we benefitted from past experience. (4) An effective advertising campaign using mailings and a teleconference to answer potential participants questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very effective and that ensemble of decision trees offer off-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at \texttthttp://www.kddcup-orange.com/.
RIS
TY - CPAPER TI - Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database AU - Isabelle Guyon AU - Vincent Lemaire AU - Marc Boullé AU - Gideon Dror AU - David Vogel BT - Proceedings of KDD-Cup 2009 Competition DA - 2009/12/04 ED - Gideon Dror ED - Mar Boullé ED - Isabelle Guyon ED - Vincent Lemaire ED - David Vogel ID - pmlr-v7-guyon09 PB - PMLR DP - Proceedings of Machine Learning Research VL - 7 SP - 1 EP - 22 L1 - http://proceedings.mlr.press/v7/guyon09/guyon09.pdf UR - https://proceedings.mlr.press/v7/guyon09.html AB - We organized the KDD cup 2009 around a marketing problem with the goal of identifying data mining techniques capable of rapidly building predictive models and scoring new entries on a large database. Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). The challenge started on March 10, 2009 and ended on May 11, 2009. This challenge attracted over 450 participants from 46 countries. We attribute the popularity of the challenge to several factors: (1) A generic problem relevant to the Industry (a classification problem), but presenting a number of scientific and technical challenges of real practical interest (many missing values, large number of features - 15000 - and large number of training examples - 50000, unbalanced class proportions - fewer than 10% of the examples of the positive class), noisy data, many missing values, presence of categorical variables with many different values. (2) Prizes - Orange offers 10000 Euros in prizes. (3) A well designed protocol and web site - we benefitted from past experience. (4) An effective advertising campaign using mailings and a teleconference to answer potential participants questions. The results of the challenge were discussed at the KDD conference (June 28, 2009). The principal conclusions are that ensemble methods are very effective and that ensemble of decision trees offer off-the-shelf solutions to problems with large numbers of samples and attributes, mixed types of variables, and lots of missing values. The data and the platform of the challenge remain available for research and educational purposes at \texttthttp://www.kddcup-orange.com/. ER -
APA
Guyon, I., Lemaire, V., Boullé, M., Dror, G. & Vogel, D.. (2009). Analysis of the KDD Cup 2009: Fast Scoring on a Large Orange Customer Database. Proceedings of KDD-Cup 2009 Competition, in Proceedings of Machine Learning Research 7:1-22 Available from https://proceedings.mlr.press/v7/guyon09.html.

Related Material