Feature Engineering in User’s Music Preference Prediction


Jianjun Xie, Scott Leishman, Liang Tian, David Lisuk, Seongjoon Koo, Matthias Blume ;
Proceedings of KDD Cup 2011, PMLR 18:183-197, 2012.


The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.

Related Material