A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games

Yudong Hu, Haoran Li, Congying Han, Tiande Guo, Bonan Li, Mingqiang Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24482-24508, 2025.

Abstract

Solving the Nash equilibrium in normal-form games with large-scale strategy spaces presents significant challenges. Open-ended learning frameworks, such as PSRO and its variants, have emerged as effective solutions. However, these methods often lack an efficient metric for evaluating strategy improvement, which limits their effectiveness in approximating equilibria. In this paper, we introduce a novel evaluative metric called Advantage, which possesses desirable properties inherently connected to the Nash equilibrium, ensuring that each strategy update approaches equilibrium. Building upon this, we propose the Advantage Policy Space Response Oracle (A-PSRO), an innovative unified open-ended learning framework applicable to both zero-sum and general-sum games. A-PSRO leverages the Advantage as a refined evaluation metric, leading to a consistent learning objective for agents in normal-form games. Experiments showcase that A-PSRO significantly reduces exploitability in zero-sum games and improves rewards in general-sum games, outperforming existing algorithms and validating its practical effectiveness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hu25n, title = {A-{PSRO}: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games}, author = {Hu, Yudong and Li, Haoran and Han, Congying and Guo, Tiande and Li, Bonan and Li, Mingqiang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {24482--24508}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hu25n/hu25n.pdf}, url = {https://proceedings.mlr.press/v267/hu25n.html}, abstract = {Solving the Nash equilibrium in normal-form games with large-scale strategy spaces presents significant challenges. Open-ended learning frameworks, such as PSRO and its variants, have emerged as effective solutions. However, these methods often lack an efficient metric for evaluating strategy improvement, which limits their effectiveness in approximating equilibria. In this paper, we introduce a novel evaluative metric called Advantage, which possesses desirable properties inherently connected to the Nash equilibrium, ensuring that each strategy update approaches equilibrium. Building upon this, we propose the Advantage Policy Space Response Oracle (A-PSRO), an innovative unified open-ended learning framework applicable to both zero-sum and general-sum games. A-PSRO leverages the Advantage as a refined evaluation metric, leading to a consistent learning objective for agents in normal-form games. Experiments showcase that A-PSRO significantly reduces exploitability in zero-sum games and improves rewards in general-sum games, outperforming existing algorithms and validating its practical effectiveness.} }
Endnote
%0 Conference Paper %T A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games %A Yudong Hu %A Haoran Li %A Congying Han %A Tiande Guo %A Bonan Li %A Mingqiang Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hu25n %I PMLR %P 24482--24508 %U https://proceedings.mlr.press/v267/hu25n.html %V 267 %X Solving the Nash equilibrium in normal-form games with large-scale strategy spaces presents significant challenges. Open-ended learning frameworks, such as PSRO and its variants, have emerged as effective solutions. However, these methods often lack an efficient metric for evaluating strategy improvement, which limits their effectiveness in approximating equilibria. In this paper, we introduce a novel evaluative metric called Advantage, which possesses desirable properties inherently connected to the Nash equilibrium, ensuring that each strategy update approaches equilibrium. Building upon this, we propose the Advantage Policy Space Response Oracle (A-PSRO), an innovative unified open-ended learning framework applicable to both zero-sum and general-sum games. A-PSRO leverages the Advantage as a refined evaluation metric, leading to a consistent learning objective for agents in normal-form games. Experiments showcase that A-PSRO significantly reduces exploitability in zero-sum games and improves rewards in general-sum games, outperforming existing algorithms and validating its practical effectiveness.
APA
Hu, Y., Li, H., Han, C., Guo, T., Li, B. & Li, M.. (2025). A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:24482-24508 Available from https://proceedings.mlr.press/v267/hu25n.html.

Related Material