A Theoretical Analysis of NDCG Type Ranking Measures
Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:25-54, 2013.
Ranking has been extensively studied in information retrieval, machine learning and statistics. A central problem in ranking is to design a ranking measure for evaluation of ranking functions. State of the art leaning to rank methods often train a ranking function by using a ranking measure as the objective to maximize. In this paper we study, from a theoretical perspective, the widely used NDCG type ranking measures. We analyze the behavior of these ranking measures as the number of objects to rank getting large. We first show that, whatever the ranking function is, the standard NDCG which adopts a logarithmic discount, converges to 1 as the number of items to rank goes to infinity. On the first sight, this result seems to imply that NDCG cannot distinguish good and bad ranking functions, contradicting to the empirical success of NDCG in many applications. Our next main result is a theorem which shows that although NDCG converge to the same limit for all ranking functions, it has distinguishability for ranking functions in a strong sense. We then investigate NDCG with other possible discount. Specifically we characterize the class of feasible discount functions for NDCG. We also compare the limiting behavior and the power of distinguishability of these feasible NDCG type measures to the standard NDCG. We next turn to the cut-off version of NDCG, i.e., NDCG@k. The most popular NDCG@k uses a combination of a slow logarithmic decay and a hard cut-off as its discount. So a natural question is why not simply use a smooth discount with fast decay? We show that if the decay is too fast, then the NDCG measure does not have strong power of distinguishability and even not converge. Finally, feasible NDCG@k are also discussed.