[edit]
Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45988-46007, 2025.
Abstract
Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO’s composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.