The Yahoo! Music Dataset and KDD-Cup’11
Proceedings of KDD Cup 2011, PMLR 18:3-18, 2012.
KDD-Cup 2011 challenged the community to identify user tastes in music by leveraging Yahoo! Music user ratings. The competition hosted two tracks, which were based on two datasets sampled from the raw data, including hundreds of millions of ratings. The underlying ratings were given to four types of musical items: tracks, albums, artists, and genres, forming a four level hierarchical taxonomy. The challenge started on March 15, 2011 and ended on June 30, 2011 attracting 2389 participants, 2100 of which were active by the end of the competition. The popularity of the challenge is related to the fact that learning a large scale recommender systems is a generic problem, highly relevant to the industry. In addition, the contest drew interest by introducing a number of scientific and technical challenges including dataset size, hierarchical structure of items, high resolution timestamps of ratings, and a non-conventional ranking-based task. This paper provides the organizers’ account of the contest, including: a detailed analysis of the datasets, discussion of the contest goals and actual conduct, and lessons learned throughout the contest.