Generalized Beliefs for Cooperative AI

Darius Muglich, Luisa M Zintgraf, Christian A Schroeder De Witt, Shimon Whiteson, Jakob Foerster
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:16062-16082, 2022.

Abstract

Self-play is a common method for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. To overcome this, we propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this belief model for both search and training of a best response over a pool of policies to greatly improve zero-shot coordination. We also show how our paradigm promotes explainability and interpretability of nuanced agent conventions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-muglich22a, title = {Generalized Beliefs for Cooperative {AI}}, author = {Muglich, Darius and Zintgraf, Luisa M and De Witt, Christian A Schroeder and Whiteson, Shimon and Foerster, Jakob}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {16062--16082}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/muglich22a/muglich22a.pdf}, url = {https://proceedings.mlr.press/v162/muglich22a.html}, abstract = {Self-play is a common method for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. To overcome this, we propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this belief model for both search and training of a best response over a pool of policies to greatly improve zero-shot coordination. We also show how our paradigm promotes explainability and interpretability of nuanced agent conventions.} }
Endnote
%0 Conference Paper %T Generalized Beliefs for Cooperative AI %A Darius Muglich %A Luisa M Zintgraf %A Christian A Schroeder De Witt %A Shimon Whiteson %A Jakob Foerster %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-muglich22a %I PMLR %P 16062--16082 %U https://proceedings.mlr.press/v162/muglich22a.html %V 162 %X Self-play is a common method for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. To overcome this, we propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this belief model for both search and training of a best response over a pool of policies to greatly improve zero-shot coordination. We also show how our paradigm promotes explainability and interpretability of nuanced agent conventions.
APA
Muglich, D., Zintgraf, L.M., De Witt, C.A.S., Whiteson, S. & Foerster, J.. (2022). Generalized Beliefs for Cooperative AI. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:16062-16082 Available from https://proceedings.mlr.press/v162/muglich22a.html.

Related Material