OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning

Alexander Vezhnevets, Yuhuai Wu, Maria Eckstein, Rémi Leblond, Joel Z Leibo
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9733-9742, 2020.

Abstract

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn’t seen during training. We propose two new games with concealed information and complex, non-transitive reward structure (think rock-paper-scissors). It turns out that most current deep reinforcement learning methods fail to efficiently explore the strategy space, thus learning policies that generalise poorly to unseen opponents. We then propose a novel hierarchical agent architecture, where the hierarchy is grounded in the game-theoretic structure of the game – the top level chooses strategic responses to opponents, while the low level implements them into policy over primitive actions. This grounding facilitates credit assignment across the levels of hierarchy. Our experiments show that the proposed hierarchical agent is capable of generalisation to unseen opponents, while conventional baselines fail to generalise whatsoever.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-vezhnevets20a, title = {{OP}tions as {RE}sponses: Grounding behavioural hierarchies in multi-agent reinforcement learning}, author = {Vezhnevets, Alexander and Wu, Yuhuai and Eckstein, Maria and Leblond, R{\'e}mi and Leibo, Joel Z}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9733--9742}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/vezhnevets20a/vezhnevets20a.pdf}, url = {https://proceedings.mlr.press/v119/vezhnevets20a.html}, abstract = {This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn’t seen during training. We propose two new games with concealed information and complex, non-transitive reward structure (think rock-paper-scissors). It turns out that most current deep reinforcement learning methods fail to efficiently explore the strategy space, thus learning policies that generalise poorly to unseen opponents. We then propose a novel hierarchical agent architecture, where the hierarchy is grounded in the game-theoretic structure of the game – the top level chooses strategic responses to opponents, while the low level implements them into policy over primitive actions. This grounding facilitates credit assignment across the levels of hierarchy. Our experiments show that the proposed hierarchical agent is capable of generalisation to unseen opponents, while conventional baselines fail to generalise whatsoever.} }
Endnote
%0 Conference Paper %T OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning %A Alexander Vezhnevets %A Yuhuai Wu %A Maria Eckstein %A Rémi Leblond %A Joel Z Leibo %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-vezhnevets20a %I PMLR %P 9733--9742 %U https://proceedings.mlr.press/v119/vezhnevets20a.html %V 119 %X This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn’t seen during training. We propose two new games with concealed information and complex, non-transitive reward structure (think rock-paper-scissors). It turns out that most current deep reinforcement learning methods fail to efficiently explore the strategy space, thus learning policies that generalise poorly to unseen opponents. We then propose a novel hierarchical agent architecture, where the hierarchy is grounded in the game-theoretic structure of the game – the top level chooses strategic responses to opponents, while the low level implements them into policy over primitive actions. This grounding facilitates credit assignment across the levels of hierarchy. Our experiments show that the proposed hierarchical agent is capable of generalisation to unseen opponents, while conventional baselines fail to generalise whatsoever.
APA
Vezhnevets, A., Wu, Y., Eckstein, M., Leblond, R. & Leibo, J.Z.. (2020). OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9733-9742 Available from https://proceedings.mlr.press/v119/vezhnevets20a.html.

Related Material