Feature Engineering in User’s Music Preference Prediction

Jianjun Xie; Scott Leishman; Liang Tian; David Lisuk; Seongjoon Koo; Matthias Blume

Feature Engineering in User’s Music Preference Prediction

Jianjun Xie, Scott Leishman, Liang Tian, David Lisuk, Seongjoon Koo, Matthias Blume

Proceedings of KDD Cup 2011, PMLR 18:183-197, 2012.

Abstract

The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.

Cite this Paper

BibTeX


@InProceedings{pmlr-v18-xie12a,
  title = 	 {Feature Engineering in User’s Music Preference Prediction},
  author = 	 {Xie, Jianjun and Leishman, Scott and Tian, Liang and Lisuk, David and Koo, Seongjoon and Blume, Matthias},
  booktitle = 	 {Proceedings of KDD Cup 2011},
  pages = 	 {183--197},
  year = 	 {2012},
  editor = 	 {Dror, Gideon and Koren, Yehuda and Weimer, Markus},
  volume = 	 {18},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v18/xie12a/xie12a.pdf},
  url = 	 {https://proceedings.mlr.press/v18/xie12a.html},
  abstract = 	 {The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.}
}

Endnote

%0 Conference Paper
%T Feature Engineering in User’s Music Preference Prediction
%A Jianjun Xie
%A Scott Leishman
%A Liang Tian
%A David Lisuk
%A Seongjoon Koo
%A Matthias Blume
%B Proceedings of KDD Cup 2011
%C Proceedings of Machine Learning Research
%D 2012
%E Gideon Dror
%E Yehuda Koren
%E Markus Weimer	
%F pmlr-v18-xie12a
%I PMLR
%P 183--197
%U https://proceedings.mlr.press/v18/xie12a.html
%V 18
%X The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.

RIS


TY  - CPAPER
TI  - Feature Engineering in User’s Music Preference Prediction
AU  - Jianjun Xie
AU  - Scott Leishman
AU  - Liang Tian
AU  - David Lisuk
AU  - Seongjoon Koo
AU  - Matthias Blume
BT  - Proceedings of KDD Cup 2011
DA  - 2012/06/01
ED  - Gideon Dror
ED  - Yehuda Koren
ED  - Markus Weimer	
ID  - pmlr-v18-xie12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 18
SP  - 183
EP  - 197
L1  - http://proceedings.mlr.press/v18/xie12a/xie12a.pdf
UR  - https://proceedings.mlr.press/v18/xie12a.html
AB  - The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.
ER  -

APA


Xie, J., Leishman, S., Tian, L., Lisuk, D., Koo, S. & Blume, M.. (2012). Feature Engineering in User’s Music Preference Prediction. Proceedings of KDD Cup 2011, in Proceedings of Machine Learning Research 18:183-197 Available from https://proceedings.mlr.press/v18/xie12a.html.

Related Material

Download PDF