[edit]

# Enhanced statistical rankings via targeted data collection

*Proceedings of the 30th International Conference on Machine Learning*, PMLR 28(1):489-497, 2013.

#### Abstract

Given a graph where vertices represent alternatives and pairwise comparison data, y_ij, is given on the edges, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with pairwise comparisons. We study the dependence of the statistical ranking problem on the available pairwise data, i.e., pairs (i,j) for which the pairwise comparison data y_ij is known, and propose a framework to identify data which, when augmented with the current dataset, maximally increases the Fisher information of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding an edge set on the graph (with a fixed number of edges) such that the second eigenvalue of the graph Laplacian is maximal. This reduction of the data collection problem to a spectral graph-theoretic question is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking.