On NDCG Consistency of Listwise Ranking Methods
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR 15:618-626, 2011.
We examine the consistency of listwise ranking methods with respect to the popular Normalized Discounted Cumulative Gain (NDCG) criterion. The most successful listwise approaches replace NDCG with a surrogate that is easier to optimize. We characterize NDCG consistent surrogates to discover a surprising fact: several commonly used surrogates are NDCG inconsistent. We then show how to change them so that they become NDCG consistent in a strong but natural sense. An explicit characterization of strong NDCG consistency is provided. Going beyond qualitative consistency considerations, we also give quantitive statements that enable us to transform the excess error, as measured in the surrogate, to the excess error in comparison to the Bayes optimal ranking function for NDCG. Finally, we also derive improved results if a certain natural “low noise"” or “large margin"” condition holds. Our experiments demonstrate that ensuring NDCG consistency does improve the performance of listwise ranking methods on real-world datasets. Moreover, a novel surrogate function suggested by our theoretical results leads to further improvements over NDCG consistent versions of existing surrogates. [pdf][supplementary]