[edit]
A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24482-24508, 2025.
Abstract
Solving the Nash equilibrium in normal-form games with large-scale strategy spaces presents significant challenges. Open-ended learning frameworks, such as PSRO and its variants, have emerged as effective solutions. However, these methods often lack an efficient metric for evaluating strategy improvement, which limits their effectiveness in approximating equilibria. In this paper, we introduce a novel evaluative metric called Advantage, which possesses desirable properties inherently connected to the Nash equilibrium, ensuring that each strategy update approaches equilibrium. Building upon this, we propose the Advantage Policy Space Response Oracle (A-PSRO), an innovative unified open-ended learning framework applicable to both zero-sum and general-sum games. A-PSRO leverages the Advantage as a refined evaluation metric, leading to a consistent learning objective for agents in normal-form games. Experiments showcase that A-PSRO significantly reduces exploitability in zero-sum games and improves rewards in general-sum games, outperforming existing algorithms and validating its practical effectiveness.