[edit]
Learning in Nonzero-Sum Stochastic Games with Potentials
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7688-7699, 2021.
Abstract
Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agent systems. In this paper, we introduce a new generation of MARL learners that can handle \textit{nonzero-sum} payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, $\ourmethod$, enables independent agents to learn Nash equilibrium strategies in \textit{polynomial time}. We demonstrate our framework tackles previously unsolvable tasks such as \textit{Coordination Navigation} and \textit{large selfish routing games} and that it outperforms the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.