- title: 'The Yahoo! Music Dataset and KDD-Cup’11' abstract: 'KDD-Cup 2011 challenged the community to identify user tastes in music by leveraging Yahoo! Music user ratings. The competition hosted two tracks, which were based on two datasets sampled from the raw data, including hundreds of millions of ratings. The underlying ratings were given to four types of musical items: tracks, albums, artists, and genres, forming a four level hierarchical taxonomy. The challenge started on March 15, 2011 and ended on June 30, 2011 attracting 2389 participants, 2100 of which were active by the end of the competition. The popularity of the challenge is related to the fact that learning a large scale recommender systems is a generic problem, highly relevant to the industry. In addition, the contest drew interest by introducing a number of scientific and technical challenges including dataset size, hierarchical structure of items, high resolution timestamps of ratings, and a non-conventional ranking-based task. This paper provides the organizers’ account of the contest, including: a detailed analysis of the datasets, discussion of the contest goals and actual conduct, and lessons learned throughout the contest.' volume: 18 URL: https://proceedings.mlr.press/v18/dror12a.html PDF: http://proceedings.mlr.press/v18/dror12a/dror12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-dror12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Gideon family: Dror - given: Noam family: Koenigstein - given: Yehuda family: Koren - given: Markus family: Weimer editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 3-18 id: dror12a issued: date-parts: - 2012 - 6 - 1 firstpage: 3 lastpage: 18 published: 2012-06-01 00:00:00 +0000 - title: 'A Linear Ensemble of Individual and Blended Models for Music Rating Prediction' abstract: 'Track 1 of KDDCup 2011 aims at predicting the rating behavior of users in the Yahoo! Music system. At National Taiwan University, we organize a course that teams up students to work on both tracks of KDDCup 2011. For track 1, we first tackle the problem by building variants of existing individual models, including Matrix Factorization, Restricted Boltzmann Machine, k-Nearest Neighbors, Probabilistic Latent Semantic Analysis, Probabilistic Principle Component Analysis and Supervised Regression. We then blend the individual models along with some carefully extracted features in a non-linear manner. A large linear ensemble that contains both the individual and the blended models is learned and taken through some post-processing steps to form the final solution. The four stages: individual model building, non-linear blending, linear ensemble and post-processing lead to a successful final solution, within which techniques on feature engineering and aggregation (blending and ensemble learning) play crucial roles. Our team is the first prize winner of both tracks of KDD Cup 2011.' volume: 18 URL: https://proceedings.mlr.press/v18/chen12a.html PDF: http://proceedings.mlr.press/v18/chen12a/chen12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-chen12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Po-Lung family: Chen - given: Chen-Tse family: Tsai - given: Yao-Nan family: Chen - given: Ku-Chun family: Chou - given: Chun-Liang family: Li - given: Cheng-Hao family: Tsai - given: Kuan-Wei family: Wu - given: Yu-Cheng family: Chou - given: Chung-Yi family: Li - given: Wei-Shih family: Lin - given: Shu-Hao family: Yu - given: Rong-Bing family: Chiu - given: Chieh-Yen family: Lin - given: Chien-Chih family: Wang - given: Po-Wei family: Wang - given: Wei-Lun family: Su - given: Chen-Hung family: Wu - given: Tsung-Ting family: Kuo - given: Todd G. family: McKenzie - given: Ya-Hsuan family: Chang - given: Chun-Sung family: Ferng - given: Chia-Mau family: Ni - given: Hsuan-Tien family: Lin - given: Chih-Jen family: Lin - given: Shou-De family: Lin editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 21-60 id: chen12a issued: date-parts: - 2012 - 6 - 1 firstpage: 21 lastpage: 60 published: 2012-06-01 00:00:00 +0000 - title: 'Collaborative Filtering Ensemble' abstract: 'This paper provides the solution of the team “commendo” on the Track1 dataset of the KDD Cup 2011 Dror et al.. Yahoo Labs provides a snapshot of their music-rating database as dataset for the competition. We get approximately 260 million ratings from 1 million users on 600k items. Timestamp and taxonomy information are added to the ratings. The goal of the competition was to predict unknown ratings on a testset with RMSE as error measure. Our final submission is a blend of different collaborative filtering algorithms. The algorithms are trained consecutively and they are blended together with a neural network.' volume: 18 URL: https://proceedings.mlr.press/v18/jahrer12a.html PDF: http://proceedings.mlr.press/v18/jahrer12a/jahrer12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-jahrer12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Michael family: Jahrer - given: Andreas family: Töscher editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 61-74 id: jahrer12a issued: date-parts: - 2012 - 6 - 1 firstpage: 61 lastpage: 74 published: 2012-06-01 00:00:00 +0000 - title: 'Rating Prediction with Informative Ensemble of Multi-Resolution Dynamic Models' abstract: 'The Yahoo! music rating data set in KDD Cup 2011 raises several interesting challenges: (1) The data covers a lengthy time period of more than eight years. (2) Not only are training ratings associated date and time information, so are the test ratings. (3) The items form a hierarchy consisting of four types of items: genres, artists, albums and tracks. To capture the rich temporal dynamics within the data set, we design a class of time-aware matrix/tensor factorization models, which adopts time series based parameterizations and models user/item drifting behaviors at multiple temporal resolutions. We also incorporate the taxonomical structure into the item parameters by introducing sharing parameters between ancestors and descendants in the taxonomy. Finally, we have identified some conditions that systematically affect the effectiveness of different types of models and parameter settings. Based on these findings, we designed an informative ensemble framework, which considers additional meta features when making predictions for a particular pair of user and item. Using these techniques, we built the best single model reported officially, and our final ensemble model got third place in KDD Cup 2011.' volume: 18 URL: https://proceedings.mlr.press/v18/zheng12a.html PDF: http://proceedings.mlr.press/v18/zheng12a/zheng12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-zheng12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Zhao family: Zheng - given: Tianqi family: Chen - given: Nathan family: Liu - given: Qiang family: Yang - given: Yong family: Yu editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 75-97 id: zheng12a issued: date-parts: - 2012 - 6 - 1 firstpage: 75 lastpage: 97 published: 2012-06-01 00:00:00 +0000 - title: 'Novel Models and Ensemble Techniques to Discriminate Favorite Items from Unrated Ones for Personalized Music Recommendation' abstract: 'The Track 2 problem in KDD-Cup 2011 (music recommendation) is to discriminate between music tracks highly rated by a given user from those which are overall highly rated, but not rated by the given user. The training dataset consists of not only user rating history, but also the taxonomic information of track, artist, album, and genre. This paper describes the solution of the National Taiwan University team which ranked first place in the competition. We exploited a diverse of models (neighborhood models, latent models, Bayesian Personalized Ranking models, and random-walk models) with local blending and global ensemble to achieve 97.45% in accuracy on the testing dataset.' volume: 18 URL: https://proceedings.mlr.press/v18/mckenzie12a.html PDF: http://proceedings.mlr.press/v18/mckenzie12a/mckenzie12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-mckenzie12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Todd G. family: McKenzie - given: Chun-Sung family: Ferng - given: Yao-Nan family: Chen - given: Chun-Liang family: Li - given: Cheng-Hao family: Tsai - given: Kuan-Wei family: Wu - given: Ya-Hsuan family: Chang - given: Chung-Yi family: Li - given: Wei-Shih family: Lin - given: Shu-Hao family: Yu - given: Chieh-Yen family: Lin - given: Po-Wei family: Wang - given: Chia-Mau family: Ni - given: Wei-Lun family: Su - given: Tsung-Ting family: Kuo - given: Chen-Tse family: Tsai - given: Po-Lung family: Chen - given: Rong-Bing family: Chiu - given: Ku-Chun family: Chou - given: Yu-Cheng family: Chou - given: Chien-Chih family: Wang - given: Chen-Hung family: Wu - given: Hsuan-Tien family: Lin - given: Chih-Jen family: Lin - given: Shou-De family: Lin editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 101-135 id: mckenzie12a issued: date-parts: - 2012 - 6 - 1 firstpage: 101 lastpage: 135 published: 2012-06-01 00:00:00 +0000 - title: 'Hybrid Recommendation Models for Binary User Preference Prediction Problem' abstract: 'This paper presents detailed information of our solutions to the task 2 of KDD Cup 2011. The task 2 is called binary user preference prediction problem in the paper because it aims at separating tracks rated highly by specific users from tracks not rated by them, and the solutions of this task can be easily applied to binary user behavior data. In the contest, we firstly implemented many different models, including neighborhood-based models, latent factor models, content-based models, etc. Then, linear combination is used to combine different models together. Finally, we used robust post-processing to further refine the special user-item pairs. The final error rate is 2.4808% which placed number 2 in the Leaderboard.' volume: 18 URL: https://proceedings.mlr.press/v18/lai12a.html PDF: http://proceedings.mlr.press/v18/lai12a/lai12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-lai12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Siwei family: Lai - given: Yang family: Liu - given: Huxiang family: Gu - given: Liheng family: Xu - given: Kang family: Liu - given: Shiming family: Xiang - given: Jun family: Zhao - given: Rui family: Diao - given: Liang family: Xiang - given: Hang family: Li - given: Dong family: Wang editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 137-151 id: lai12a issued: date-parts: - 2012 - 6 - 1 firstpage: 137 lastpage: 151 published: 2012-06-01 00:00:00 +0000 - title: 'Collaborative Filtering Ensemble for Ranking' abstract: 'This paper provides the solution of the team “commendo” on the Track2 dataset of the KDD Cup 2011 Dror et al.. Yahoo Labs provides a snapshot of their music-rating database as dataset for the competition, consisting of approximately 62 million ratings from 250k users on 300k items. The dataset includes hierachical information about the items. The goal of the competition is to distinguish beteen “High rated” and “Not rated” items of a user. The rating scale is discrete and ranges from 0 to 100, while a “High” rating is a rating$\geq 0$. The error measure is the percent of false rated tracks over all users, known as the fractions of misclassifications. The task is to minimize this error rate, hence the ranking should be optimized. Our final submission is a blend of different collaborative filtering algorithms enhanced, with basic statistics. The algorithms are trained consecutively and they are blended together with a neural network. Each of the algorithms optimizes a rank error measure.' volume: 18 URL: https://proceedings.mlr.press/v18/jahrer12b.html PDF: http://proceedings.mlr.press/v18/jahrer12b/jahrer12b.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-jahrer12b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Michael family: Jahrer - given: Andreas family: Töscher editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 153-167 id: jahrer12b issued: date-parts: - 2012 - 6 - 1 firstpage: 153 lastpage: 167 published: 2012-06-01 00:00:00 +0000 - title: 'Taxonomy-Informed Latent Factor Models for Implicit Feedback' abstract: 'We describe a latent-factor-model-based approach to the Track 2 task of KDD Cup 2011, which required learning to discriminate between highly rated and unrated items from a large dataset of music ratings. We take the pairwise ranking route, training our models to rank the highly rated items above the unrated items that are sampled from the same distribution. Using the item relationship information from the provided taxonomy to constrain item representations results in improved predictive performance. Providing the model with features summarizing the user’s rating history as it relates to the item being ranked leads to further gains, producing the best single model result on Track 2.' volume: 18 URL: https://proceedings.mlr.press/v18/mnih12a.html PDF: http://proceedings.mlr.press/v18/mnih12a/mnih12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-mnih12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Andriy family: Mnih editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 169-181 id: mnih12a issued: date-parts: - 2012 - 6 - 1 firstpage: 169 lastpage: 181 published: 2012-06-01 00:00:00 +0000 - title: 'Feature Engineering in User’s Music Preference Prediction' abstract: 'The second track of this year’s KDD Cup asked contestants to separate a user’s highly rated songs from unrated songs for a large set of Yahoo! Music listeners. We cast this task as a binary classification problem and addressed it utilizing gradient boosted decision trees. We created a set of highly predictive features, each with a clear explanation. These features were grouped into five categories: hierarchical linkage features, track-based statistical features, user-based statistical features, features derived from the k-nearest neighbors of the users, and features derived from the k-nearest neighbors of the items. No music domain knowledge was needed to create these features. We demonstrate that each group of features improved the prediction accuracy of the classification model. We also discuss the top predictive features of each category in this paper.' volume: 18 URL: https://proceedings.mlr.press/v18/xie12a.html PDF: http://proceedings.mlr.press/v18/xie12a/xie12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-xie12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Jianjun family: Xie - given: Scott family: Leishman - given: Liang family: Tian - given: David family: Lisuk - given: Seongjoon family: Koo - given: Matthias family: Blume editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 183-197 id: xie12a issued: date-parts: - 2012 - 6 - 1 firstpage: 183 lastpage: 197 published: 2012-06-01 00:00:00 +0000 - title: 'Combining Predictors for Recommending Music:the False Positives’ approach to KDD Cup track 2' abstract: 'We describe our solution for the KDD Cup 2011 track 2 challenge. Our solution relies heavily on ensembling together diverse individual models for the prediction task, and achieved a final leaderboard/Test 1 misclassification rate of 3.8863%. This paper provides details on both the modeling and ensemble creation steps.' volume: 18 URL: https://proceedings.mlr.press/v18/balakrishnan12a.html PDF: http://proceedings.mlr.press/v18/balakrishnan12a/balakrishnan12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-balakrishnan12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Suhrid family: Balakrishnan - given: Rensheng family: Wang - given: Carlos family: Scheidegger - given: Angus family: MacLellan - given: Yifan family: Hu - given: Aaron family: Archer - given: Shankar family: Krishnan - given: David family: Applegate - given: Guang Qin family: Ma - given: S. Tom family: Au editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 199-213 id: balakrishnan12a issued: date-parts: - 2012 - 6 - 1 firstpage: 199 lastpage: 213 published: 2012-06-01 00:00:00 +0000 - title: 'Committee Based Prediction System for Recommendation: KDD Cup 2011, Track2' abstract: 'This paper describes a solution to the 2011 KDD Cup competition, Track2: discriminating between highly rated tracks and unrated tracks in a Yahoo! Music dataset. Our approach was to use supervised learning based on 65 features generated using various techniques such as collaborative filtering, SVD, and similarity scoring. During our modeling stage, we created a number of predictors including logistic regression, artificial neural networks and gradient-boosted decision trees. To further improve robustness and reduce the variance, we used three of our top performing models and took a weighted average for the final submission, which achieved 4.3768% error.' volume: 18 URL: https://proceedings.mlr.press/v18/zhang12a.html PDF: http://proceedings.mlr.press/v18/zhang12a/zhang12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-zhang12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Hang family: Zhang - given: Eric family: Riedl - given: Valery family: Petrushin - given: Siddharth family: Pal - given: Jacob family: Spoelstra editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 215-229 id: zhang12a issued: date-parts: - 2012 - 6 - 1 firstpage: 215 lastpage: 229 published: 2012-06-01 00:00:00 +0000 - title: 'Personalized Ranking for Non-Uniformly Sampled Items' abstract: 'We develop an adapted version of the Bayesian Personalized Ranking (BPR) optimization criterion (Rendle et al., 2009) that takes the non-uniform sampling of negative test items – as in track 2 of the KDD Cup 2011 – into account. Furthermore, we present a modified version of the generic BPR learning algorithm that maximizes the new criterion. We use it to train ranking matrix factorization models as components of an ensemble. Additionally, we combine the ranking predictions with rating prediction models to also take into account rating data. With an ensemble of such combined models, we ranked 8th (out of more than 300 teams) in track 2 of the KDD Cup 2011, without using the additional taxonomic information offered by the competition organizers.' volume: 18 URL: https://proceedings.mlr.press/v18/gantner12a.html PDF: http://proceedings.mlr.press/v18/gantner12a/gantner12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-gantner12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Zeno family: Gantner - given: Lucas family: Drumond - given: Christoph family: Freudenthaler - given: Lars family: Schmidt-Thieme editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 231-247 id: gantner12a issued: date-parts: - 2012 - 6 - 1 firstpage: 231 lastpage: 247 published: 2012-06-01 00:00:00 +0000 - title: 'The Love-Hate Square Counting Method for Recommender Systems' abstract: 'Recommender systems provide personalized suggestions to users and are critical to the success of many e-commerce sites, such as Netflix and Amazon. Outside of e-commerce, recommender systems can be deployed in fields such as intelligence analysis, for recommending high-quality information source to analysts for further examination. In this work, we present the square counting method for rating predictions in recommender systems. Our method is based on analyzing the bipartite rating network with score-labeled edges representing user nodes’ ratings to item nodes. Edges are denoted as an I-love-it or I-hate-it edge based on whether the rating score on the edge is above or below a threshold. For a target user-item pair, we count the number for each configuration of love-hate squares that involve the target pair, where the sequence of I-love-it or I-hate-it edges determine the particular configuration. The counts are used as features in a supervised machine learning framework for training and rating prediction. The method is implemented and empirically evaluated on a large-scale Yahoo! music user-item rating dataset. Results show that the square counting method is fast, simple to parallelize, scalable to massive datasets and makes highly accurate predictions. Finally, we report an interesting empirical finding that configurations with consecutive I-hate-it edges seem to provide the most powerful signal in predicting a user’s love for an item.' volume: 18 URL: https://proceedings.mlr.press/v18/kong12a.html PDF: http://proceedings.mlr.press/v18/kong12a/kong12a.pdf edit: https://github.com/mlresearch//v18/edit/gh-pages/_posts/2012-06-01-kong12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of KDD Cup 2011' publisher: 'PMLR' author: - given: Joseph S. family: Kong - given: Kyle family: Teague - given: Justin family: Kessler editor: - given: Gideon family: Dror - given: Yehuda family: Koren - given: Markus family: Weimer page: 249-261 id: kong12a issued: date-parts: - 2012 - 6 - 1 firstpage: 249 lastpage: 261 published: 2012-06-01 00:00:00 +0000