“Other-Play” for Zero-Shot Coordination

Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4399-4410, 2020.

Abstract

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g.humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents as well as with human players than SP agents.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-hu20a, title = {“{O}ther-Play” for Zero-Shot Coordination}, author = {Hu, Hengyuan and Lerer, Adam and Peysakhovich, Alex and Foerster, Jakob}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {4399--4410}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/hu20a/hu20a.pdf}, url = { http://proceedings.mlr.press/v119/hu20a.html }, abstract = {We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g.humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents as well as with human players than SP agents.} }
Endnote
%0 Conference Paper %T “Other-Play” for Zero-Shot Coordination %A Hengyuan Hu %A Adam Lerer %A Alex Peysakhovich %A Jakob Foerster %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-hu20a %I PMLR %P 4399--4410 %U http://proceedings.mlr.press/v119/hu20a.html %V 119 %X We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g.humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents as well as with human players than SP agents.
APA
Hu, H., Lerer, A., Peysakhovich, A. & Foerster, J.. (2020). “Other-Play” for Zero-Shot Coordination. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4399-4410 Available from http://proceedings.mlr.press/v119/hu20a.html .

Related Material