Probabilistic Matrix Factorization with Non-random Missing Data

Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani
; Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1512-1520, 2014.

Abstract

We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR). Matrix factorization models exhibit state-of-the-art predictive performance in collaborative filtering. However, these models usually assume that the data is missing at random (MAR), and this is rarely the case. For example, the data is not MAR if users rate items they like more than ones they dislike. When the MAR assumption is incorrect, inferences are biased and predictive performance can suffer. Therefore, we model both the generative process for the data and the missing data mechanism. By learning these two models jointly we obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process. We present the first viable MF model for MNAR data. Our results are promising and we expect that further research on NMAR models will yield large gains in collaborative filtering.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-hernandez-lobatob14, title = {Probabilistic Matrix Factorization with Non-random Missing Data}, author = {Jose Miguel Hernandez-Lobato and Neil Houlsby and Zoubin Ghahramani}, pages = {1512--1520}, year = {2014}, editor = {Eric P. Xing and Tony Jebara}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/hernandez-lobatob14.pdf}, url = {http://proceedings.mlr.press/v32/hernandez-lobatob14.html}, abstract = {We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR). Matrix factorization models exhibit state-of-the-art predictive performance in collaborative filtering. However, these models usually assume that the data is missing at random (MAR), and this is rarely the case. For example, the data is not MAR if users rate items they like more than ones they dislike. When the MAR assumption is incorrect, inferences are biased and predictive performance can suffer. Therefore, we model both the generative process for the data and the missing data mechanism. By learning these two models jointly we obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process. We present the first viable MF model for MNAR data. Our results are promising and we expect that further research on NMAR models will yield large gains in collaborative filtering.} }
Endnote
%0 Conference Paper %T Probabilistic Matrix Factorization with Non-random Missing Data %A Jose Miguel Hernandez-Lobato %A Neil Houlsby %A Zoubin Ghahramani %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-hernandez-lobatob14 %I PMLR %J Proceedings of Machine Learning Research %P 1512--1520 %U http://proceedings.mlr.press %V 32 %N 2 %W PMLR %X We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR). Matrix factorization models exhibit state-of-the-art predictive performance in collaborative filtering. However, these models usually assume that the data is missing at random (MAR), and this is rarely the case. For example, the data is not MAR if users rate items they like more than ones they dislike. When the MAR assumption is incorrect, inferences are biased and predictive performance can suffer. Therefore, we model both the generative process for the data and the missing data mechanism. By learning these two models jointly we obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process. We present the first viable MF model for MNAR data. Our results are promising and we expect that further research on NMAR models will yield large gains in collaborative filtering.
RIS
TY - CPAPER TI - Probabilistic Matrix Factorization with Non-random Missing Data AU - Jose Miguel Hernandez-Lobato AU - Neil Houlsby AU - Zoubin Ghahramani BT - Proceedings of the 31st International Conference on Machine Learning PY - 2014/01/27 DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-hernandez-lobatob14 PB - PMLR SP - 1512 DP - PMLR EP - 1520 L1 - http://proceedings.mlr.press/v32/hernandez-lobatob14.pdf UR - http://proceedings.mlr.press/v32/hernandez-lobatob14.html AB - We propose a probabilistic matrix factorization model for collaborative filtering that learns from data that is missing not at random(MNAR). Matrix factorization models exhibit state-of-the-art predictive performance in collaborative filtering. However, these models usually assume that the data is missing at random (MAR), and this is rarely the case. For example, the data is not MAR if users rate items they like more than ones they dislike. When the MAR assumption is incorrect, inferences are biased and predictive performance can suffer. Therefore, we model both the generative process for the data and the missing data mechanism. By learning these two models jointly we obtain improved performance over state-of-the-art methods when predicting the ratings and when modeling the data observation process. We present the first viable MF model for MNAR data. Our results are promising and we expect that further research on NMAR models will yield large gains in collaborative filtering. ER -
APA
Hernandez-Lobato, J.M., Houlsby, N. & Ghahramani, Z.. (2014). Probabilistic Matrix Factorization with Non-random Missing Data. Proceedings of the 31st International Conference on Machine Learning, in PMLR 32(2):1512-1520

Related Material