A Theoretical Analysis of NDCG Type Ranking Measures

Yining Wang; Liwei Wang; Yuanzhi Li; Di He; Tie-Yan Liu

A Theoretical Analysis of NDCG Type Ranking Measures

Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Tie-Yan Liu

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:25-54, 2013.

Abstract

Ranking has been extensively studied in information retrieval, machine learning and statistics. A central problem in ranking is to design a ranking measure for evaluation of ranking functions. State of the art leaning to rank methods often train a ranking function by using a ranking measure as the objective to maximize. In this paper we study, from a theoretical perspective, the widely used NDCG type ranking measures. We analyze the behavior of these ranking measures as the number of objects to rank getting large. We first show that, whatever the ranking function is, the standard NDCG which adopts a logarithmic discount, converges to 1 as the number of items to rank goes to infinity. On the first sight, this result seems to imply that NDCG cannot distinguish good and bad ranking functions, contradicting to the empirical success of NDCG in many applications. Our next main result is a theorem which shows that although NDCG converge to the same limit for all ranking functions, it has distinguishability for ranking functions in a strong sense. We then investigate NDCG with other possible discount. Specifically we characterize the class of feasible discount functions for NDCG. We also compare the limiting behavior and the power of distinguishability of these feasible NDCG type measures to the standard NDCG. We next turn to the cut-off version of NDCG, i.e., NDCG@k. The most popular NDCG@k uses a combination of a slow logarithmic decay and a hard cut-off as its discount. So a natural question is why not simply use a smooth discount with fast decay? We show that if the decay is too fast, then the NDCG measure does not have strong power of distinguishability and even not converge. Finally, feasible NDCG@k are also discussed.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Wang13,
  title = 	 {A Theoretical Analysis of NDCG Type Ranking Measures},
  author = 	 {Wang, Yining and Wang, Liwei and Li, Yuanzhi and He, Di and Liu, Tie-Yan},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {25--54},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Wang13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Wang13.html},
  abstract = 	 {Ranking has been extensively studied in information retrieval, machine learning and statistics. A central problem in ranking is to design a ranking measure for evaluation of ranking functions. State of the art leaning to rank methods often train a ranking function by using a ranking measure as the objective to maximize. In this paper we study, from a theoretical perspective, the widely used NDCG type ranking measures. We analyze the behavior of these ranking measures as the number of objects to rank getting large. We first show that, whatever the ranking function is, the standard NDCG which adopts a logarithmic discount, converges to 1 as the number of items to rank goes to infinity. On the first sight, this result seems to imply that NDCG cannot distinguish good and bad ranking functions, contradicting to the empirical success of NDCG in many applications. Our next main result is a theorem which shows that although NDCG converge to the same limit for all ranking functions, it has distinguishability for ranking functions in a strong sense. We then investigate NDCG with other possible discount. Specifically we characterize the class of feasible discount functions for NDCG. We also compare the limiting behavior and the power of distinguishability of these feasible NDCG type measures to the standard NDCG. We next turn to the cut-off version of NDCG, i.e., NDCG@k. The most popular NDCG@k uses a combination of a slow logarithmic decay and a hard cut-off as its discount. So a natural question is why not simply use a smooth discount with fast decay? We show that if the decay is too fast, then the NDCG measure does not have strong power of distinguishability and even not converge. Finally, feasible NDCG@k are also discussed.}
}

Endnote

%0 Conference Paper
%T A Theoretical Analysis of NDCG Type Ranking Measures
%A Yining Wang
%A Liwei Wang
%A Yuanzhi Li
%A Di He
%A Tie-Yan Liu
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Wang13
%I PMLR
%P 25--54
%U https://proceedings.mlr.press/v30/Wang13.html
%V 30
%X Ranking has been extensively studied in information retrieval, machine learning and statistics. A central problem in ranking is to design a ranking measure for evaluation of ranking functions. State of the art leaning to rank methods often train a ranking function by using a ranking measure as the objective to maximize. In this paper we study, from a theoretical perspective, the widely used NDCG type ranking measures. We analyze the behavior of these ranking measures as the number of objects to rank getting large. We first show that, whatever the ranking function is, the standard NDCG which adopts a logarithmic discount, converges to 1 as the number of items to rank goes to infinity. On the first sight, this result seems to imply that NDCG cannot distinguish good and bad ranking functions, contradicting to the empirical success of NDCG in many applications. Our next main result is a theorem which shows that although NDCG converge to the same limit for all ranking functions, it has distinguishability for ranking functions in a strong sense. We then investigate NDCG with other possible discount. Specifically we characterize the class of feasible discount functions for NDCG. We also compare the limiting behavior and the power of distinguishability of these feasible NDCG type measures to the standard NDCG. We next turn to the cut-off version of NDCG, i.e., NDCG@k. The most popular NDCG@k uses a combination of a slow logarithmic decay and a hard cut-off as its discount. So a natural question is why not simply use a smooth discount with fast decay? We show that if the decay is too fast, then the NDCG measure does not have strong power of distinguishability and even not converge. Finally, feasible NDCG@k are also discussed.

RIS


TY  - CPAPER
TI  - A Theoretical Analysis of NDCG Type Ranking Measures
AU  - Yining Wang
AU  - Liwei Wang
AU  - Yuanzhi Li
AU  - Di He
AU  - Tie-Yan Liu
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Wang13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 25
EP  - 54
L1  - http://proceedings.mlr.press/v30/Wang13.pdf
UR  - https://proceedings.mlr.press/v30/Wang13.html
AB  - Ranking has been extensively studied in information retrieval, machine learning and statistics. A central problem in ranking is to design a ranking measure for evaluation of ranking functions. State of the art leaning to rank methods often train a ranking function by using a ranking measure as the objective to maximize. In this paper we study, from a theoretical perspective, the widely used NDCG type ranking measures. We analyze the behavior of these ranking measures as the number of objects to rank getting large. We first show that, whatever the ranking function is, the standard NDCG which adopts a logarithmic discount, converges to 1 as the number of items to rank goes to infinity. On the first sight, this result seems to imply that NDCG cannot distinguish good and bad ranking functions, contradicting to the empirical success of NDCG in many applications. Our next main result is a theorem which shows that although NDCG converge to the same limit for all ranking functions, it has distinguishability for ranking functions in a strong sense. We then investigate NDCG with other possible discount. Specifically we characterize the class of feasible discount functions for NDCG. We also compare the limiting behavior and the power of distinguishability of these feasible NDCG type measures to the standard NDCG. We next turn to the cut-off version of NDCG, i.e., NDCG@k. The most popular NDCG@k uses a combination of a slow logarithmic decay and a hard cut-off as its discount. So a natural question is why not simply use a smooth discount with fast decay? We show that if the decay is too fast, then the NDCG measure does not have strong power of distinguishability and even not converge. Finally, feasible NDCG@k are also discussed.
ER  -

APA


Wang, Y., Wang, L., Li, Y., He, D. & Liu, T.. (2013). A Theoretical Analysis of NDCG Type Ranking Measures. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:25-54 Available from https://proceedings.mlr.press/v30/Wang13.html.

Related Material

Download PDF