On the Limitations of the Elo, Real-World Games are Transitive, not Additive

Quentin Bertrand, Wojciech Marian Czarnecki, Gauthier Gidel
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:2905-2921, 2023.

Abstract

The Elo score has been extensively used to rank players by their skill or strength in competitive games such as chess, go, or StarCraft II. The Elo score implicitly assumes games have a strong additive—hence transitive—component. In this paper, we investigate the challenge of identifying transitive components in games. As a starting point, we show that the Elo score provably fails to extract the transitive component of some elementary transitive games. Based on this observation, we propose an alternative ranking system which properly extracts the transitive components in these games. Finally, we conduct an in-depth empirical validation on real-world game payoff matrices: it shows significant prediction performance improvements compared to the Elo score.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-bertrand23a, title = {On the Limitations of the Elo, Real-World Games are Transitive, not Additive}, author = {Bertrand, Quentin and Czarnecki, Wojciech Marian and Gidel, Gauthier}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {2905--2921}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/bertrand23a/bertrand23a.pdf}, url = {https://proceedings.mlr.press/v206/bertrand23a.html}, abstract = {The Elo score has been extensively used to rank players by their skill or strength in competitive games such as chess, go, or StarCraft II. The Elo score implicitly assumes games have a strong additive—hence transitive—component. In this paper, we investigate the challenge of identifying transitive components in games. As a starting point, we show that the Elo score provably fails to extract the transitive component of some elementary transitive games. Based on this observation, we propose an alternative ranking system which properly extracts the transitive components in these games. Finally, we conduct an in-depth empirical validation on real-world game payoff matrices: it shows significant prediction performance improvements compared to the Elo score.} }
Endnote
%0 Conference Paper %T On the Limitations of the Elo, Real-World Games are Transitive, not Additive %A Quentin Bertrand %A Wojciech Marian Czarnecki %A Gauthier Gidel %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-bertrand23a %I PMLR %P 2905--2921 %U https://proceedings.mlr.press/v206/bertrand23a.html %V 206 %X The Elo score has been extensively used to rank players by their skill or strength in competitive games such as chess, go, or StarCraft II. The Elo score implicitly assumes games have a strong additive—hence transitive—component. In this paper, we investigate the challenge of identifying transitive components in games. As a starting point, we show that the Elo score provably fails to extract the transitive component of some elementary transitive games. Based on this observation, we propose an alternative ranking system which properly extracts the transitive components in these games. Finally, we conduct an in-depth empirical validation on real-world game payoff matrices: it shows significant prediction performance improvements compared to the Elo score.
APA
Bertrand, Q., Czarnecki, W.M. & Gidel, G.. (2023). On the Limitations of the Elo, Real-World Games are Transitive, not Additive. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:2905-2921 Available from https://proceedings.mlr.press/v206/bertrand23a.html.

Related Material