Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning

Austin A. Nguyen, Anri Gu, Michael P. Wellman
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:45988-46007, 2025.

Abstract

Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO’s composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-nguyen25b, title = {Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning}, author = {Nguyen, Austin A. and Gu, Anri and Wellman, Michael P.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {45988--46007}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nguyen25b/nguyen25b.pdf}, url = {https://proceedings.mlr.press/v267/nguyen25b.html}, abstract = {Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO’s composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.} }
Endnote
%0 Conference Paper %T Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning %A Austin A. Nguyen %A Anri Gu %A Michael P. Wellman %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-nguyen25b %I PMLR %P 45988--46007 %U https://proceedings.mlr.press/v267/nguyen25b.html %V 267 %X Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may also be interested in finding solutions with particular characteristics. We address this issue of equilibrium selection in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing the strategy exploration process towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex$^2$PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex$^2$PSRO’s composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.
APA
Nguyen, A.A., Gu, A. & Wellman, M.P.. (2025). Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:45988-46007 Available from https://proceedings.mlr.press/v267/nguyen25b.html.

Related Material