Open-ended learning in symmetric zero-sum games

David Balduzzi; Marta Garnelo; Yoram Bachrach; Wojciech Czarnecki; Julien Perolat; Max Jaderberg; Thore Graepel

Open-ended learning in symmetric zero-sum games

David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:434-443, 2019.

Abstract

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them ‘winner’ and ‘loser’. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective – we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-balduzzi19a,
  title = 	 {Open-ended learning in symmetric zero-sum games},
  author =       {Balduzzi, David and Garnelo, Marta and Bachrach, Yoram and Czarnecki, Wojciech and Perolat, Julien and Jaderberg, Max and Graepel, Thore},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {434--443},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/balduzzi19a/balduzzi19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/balduzzi19a.html},
  abstract = 	 {Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them ‘winner’ and ‘loser’. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective – we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.}
}

Endnote

%0 Conference Paper
%T Open-ended learning in symmetric zero-sum games
%A David Balduzzi
%A Marta Garnelo
%A Yoram Bachrach
%A Wojciech Czarnecki
%A Julien Perolat
%A Max Jaderberg
%A Thore Graepel
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-balduzzi19a
%I PMLR
%P 434--443
%U https://proceedings.mlr.press/v97/balduzzi19a.html
%V 97
%X Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them ‘winner’ and ‘loser’. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective – we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

APA

Balduzzi, D., Garnelo, M., Bachrach, Y., Czarnecki, W., Perolat, J., Jaderberg, M. & Graepel, T.. (2019). Open-ended learning in symmetric zero-sum games. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:434-443 Available from https://proceedings.mlr.press/v97/balduzzi19a.html.

Open-ended learning in symmetric zero-sum games

Abstract

Cite this Paper

Related Material