An Empirical Study of the Discreteness Prior in Low-Rank Matrix Completion
NeurIPS 2020 Workshop on Pre-registration in Machine Learning, PMLR 148:111-125, 2021.
A reasonable assumption in recommender systems is that the rows (users) and columns (items) of the rating matrix can be split into groups (communities) with the following property: each entry of the matrix is the sum of components corresponding to community behavior and a purely low-rank component corresponding to individual behavior. We investigate (1) whether such a structure is present in real-world datasets, (2) whether the knowledge of the existence of such structure alone can improve performance, without explicit information about the community memberships. To these ends, we formulate a joint optimization problem over all (completed matrix, set of communities) pairs based on a nuclear-norm regularizer which jointly encourages both low-rank solutions and the recovery of relevant communities. Since our optimization problem is non-convex and of combinatorial complexity, we propose a heuristic algorithm to solve it. Our algorithm alternatingly refines the user and item communities through a clustering step jointly supervised by nuclear-norm regularization. The algorithm is guaranteed to converge. We performed synthetic and real data experiments to confirm our hypothesis and evaluate the efficacy of our method at recovering the relevant communities. The results shows that our method is capable of retrieving such an underlying (community behaviour + continuous low-rank) structure with high accuracy if it is present.