Modelling Behavioural Diversity for Learning in Open-Ended Games

Nicolas Perez-Nieves, Yaodong Yang, Oliver Slumbers, David H Mguni, Ying Wen, Jun Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8514-8524, 2021.

Abstract

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} – convex polytopes spanned by agents’ mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-perez-nieves21a, title = {Modelling Behavioural Diversity for Learning in Open-Ended Games}, author = {Perez-Nieves, Nicolas and Yang, Yaodong and Slumbers, Oliver and Mguni, David H and Wen, Ying and Wang, Jun}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8514--8524}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/perez-nieves21a/perez-nieves21a.pdf}, url = {https://proceedings.mlr.press/v139/perez-nieves21a.html}, abstract = {Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} – convex polytopes spanned by agents’ mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.} }
Endnote
%0 Conference Paper %T Modelling Behavioural Diversity for Learning in Open-Ended Games %A Nicolas Perez-Nieves %A Yaodong Yang %A Oliver Slumbers %A David H Mguni %A Ying Wen %A Jun Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-perez-nieves21a %I PMLR %P 8514--8524 %U https://proceedings.mlr.press/v139/perez-nieves21a.html %V 139 %X Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} – convex polytopes spanned by agents’ mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.
APA
Perez-Nieves, N., Yang, Y., Slumbers, O., Mguni, D.H., Wen, Y. & Wang, J.. (2021). Modelling Behavioural Diversity for Learning in Open-Ended Games. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8514-8524 Available from https://proceedings.mlr.press/v139/perez-nieves21a.html.

Related Material