Policy Gradient Play with Networked Agents in Markov Potential Games

Sarper Aydin, Ceyhun Eksin
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:184-195, 2023.

Abstract

We introduce a distributed policy gradient play algorithm with networked agents playing Markov potential games. Agents have rewards at each stage of the game, that depend on the joint actions of agents given a common dynamic state. Agents implement parameterized and differentiable policies to take actions against each other. Markov potential assumes the existence of potential value functions. In a differentiable Markov potential game, partial gradients of a potential function are equal to the local gradients with respect to the individual parameters. In this work, agents receive information on other agents’ parameters via a communication network in addition to rewards. Agents then use stochastic gradients with respect to local estimates of joint policy parameters to update their policy parameters. We show that agents’ joint policy converges to a first-order stationary point of Markov potential value function with any type of function approximation, state and action spaces. Numerical experiments confirm the convergence result in the lake game, a Markov potential game.

Cite this Paper


BibTeX
@InProceedings{pmlr-v211-aydin23a, title = {Policy Gradient Play with Networked Agents in Markov Potential Games}, author = {Aydin, Sarper and Eksin, Ceyhun}, booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference}, pages = {184--195}, year = {2023}, editor = {Matni, Nikolai and Morari, Manfred and Pappas, George J.}, volume = {211}, series = {Proceedings of Machine Learning Research}, month = {15--16 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v211/aydin23a/aydin23a.pdf}, url = {https://proceedings.mlr.press/v211/aydin23a.html}, abstract = {We introduce a distributed policy gradient play algorithm with networked agents playing Markov potential games. Agents have rewards at each stage of the game, that depend on the joint actions of agents given a common dynamic state. Agents implement parameterized and differentiable policies to take actions against each other. Markov potential assumes the existence of potential value functions. In a differentiable Markov potential game, partial gradients of a potential function are equal to the local gradients with respect to the individual parameters. In this work, agents receive information on other agents’ parameters via a communication network in addition to rewards. Agents then use stochastic gradients with respect to local estimates of joint policy parameters to update their policy parameters. We show that agents’ joint policy converges to a first-order stationary point of Markov potential value function with any type of function approximation, state and action spaces. Numerical experiments confirm the convergence result in the lake game, a Markov potential game.} }
Endnote
%0 Conference Paper %T Policy Gradient Play with Networked Agents in Markov Potential Games %A Sarper Aydin %A Ceyhun Eksin %B Proceedings of The 5th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2023 %E Nikolai Matni %E Manfred Morari %E George J. Pappas %F pmlr-v211-aydin23a %I PMLR %P 184--195 %U https://proceedings.mlr.press/v211/aydin23a.html %V 211 %X We introduce a distributed policy gradient play algorithm with networked agents playing Markov potential games. Agents have rewards at each stage of the game, that depend on the joint actions of agents given a common dynamic state. Agents implement parameterized and differentiable policies to take actions against each other. Markov potential assumes the existence of potential value functions. In a differentiable Markov potential game, partial gradients of a potential function are equal to the local gradients with respect to the individual parameters. In this work, agents receive information on other agents’ parameters via a communication network in addition to rewards. Agents then use stochastic gradients with respect to local estimates of joint policy parameters to update their policy parameters. We show that agents’ joint policy converges to a first-order stationary point of Markov potential value function with any type of function approximation, state and action spaces. Numerical experiments confirm the convergence result in the lake game, a Markov potential game.
APA
Aydin, S. & Eksin, C.. (2023). Policy Gradient Play with Networked Agents in Markov Potential Games. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:184-195 Available from https://proceedings.mlr.press/v211/aydin23a.html.

Related Material