Popularity Agnostic Evaluation of Knowledge Graph Embeddings
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:1059-1068, 2020.
In this paper, we show that the distribution of entities and relations in common knowledge graphs is highly skewed, with some entities and relations being much more popular than the rest. We show that while knowledge graph embedding models give state-of-the-art performance in many relational learning tasks such as link prediction, current evaluation metrics like hits@k and mrr are biased towards popular entities and relations. We propose two new evaluation metrics, strat-hits@k and strat-mrr, which are unbiased estimators of the true hits@k and mrr when the items follow a power-law distribution. Our new metrics are generalizations of hits@k and mrr that take into account the popularity of the entities and relations in the data, with a tuning parameter determining how much emphasis the metric places on popular vs. unpopular items. Using our metrics, we run experiments on benchmark datasets to show that the performance of embedding models degrades as the popularity of the entities and relations decreases, and that current reported results overestimate the performance of these models by magnifying their accuracy on popular items.