Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

Siqi Liu, Marc Lanctot, Luke Marris, Nicolas Heess
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:13793-13806, 2022.

Abstract

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-liu22h, title = {Simplex Neural Population Learning: Any-Mixture {B}ayes-Optimality in Symmetric Zero-sum Games}, author = {Liu, Siqi and Lanctot, Marc and Marris, Luke and Heess, Nicolas}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {13793--13806}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/liu22h/liu22h.pdf}, url = {https://proceedings.mlr.press/v162/liu22h.html}, abstract = {Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.} }
Endnote
%0 Conference Paper %T Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games %A Siqi Liu %A Marc Lanctot %A Luke Marris %A Nicolas Heess %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-liu22h %I PMLR %P 13793--13806 %U https://proceedings.mlr.press/v162/liu22h.html %V 162 %X Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.
APA
Liu, S., Lanctot, M., Marris, L. & Heess, N.. (2022). Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:13793-13806 Available from https://proceedings.mlr.press/v162/liu22h.html.

Related Material