Scalable Natural Policy Gradient for General-Sum Linear Quadratic Games with Known Parameters

Mostafa Shibl; Wesley Suttle; Vijay Gupta

Scalable Natural Policy Gradient for General-Sum Linear Quadratic Games with Known Parameters

Mostafa Shibl, Wesley Suttle, Vijay Gupta

Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:139-152, 2025.

Abstract

Consider a general-sum N-player linear-quadratic (LQ) game with stochastic dynamics over a finite time horizon. It is known that under some mild assumptions, the Nash equilibrium (NE) strategies for the players can be obtained by a natural policy gradient algorithm. However, the traditional implementation of the algorithm requires the availability of complete state and action information from all agents and may not scale well with the number of agents. Under the assumption of known problem parameters, we present an algorithm that assumes state and action information from only neighboring agents according to the graph describing the dynamic or cost coupling among the agents. We show that the proposed algorithm converges to an $\epsilon$-neighborhood of the NE where the value of $\epsilon$ depends on the size of the local neighborhood of agents.

Cite this Paper

BibTeX

@InProceedings{pmlr-v283-shibl25a,
  title = 	 {Scalable Natural Policy Gradient for General-Sum Linear Quadratic Games with Known Parameters},
  author =       {Shibl, Mostafa and Suttle, Wesley and Gupta, Vijay},
  booktitle = 	 {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference},
  pages = 	 {139--152},
  year = 	 {2025},
  editor = 	 {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro},
  volume = 	 {283},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {04--06 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v283/main/assets/shibl25a/shibl25a.pdf},
  url = 	 {https://proceedings.mlr.press/v283/shibl25a.html},
  abstract = 	 {Consider a general-sum N-player linear-quadratic (LQ) game with stochastic dynamics over a finite time horizon. It is known that under some mild assumptions, the Nash equilibrium (NE) strategies for the players can be obtained by a natural policy gradient algorithm. However, the traditional implementation of the algorithm requires the availability of complete state and action information from all agents and may not scale well with the number of agents. Under the assumption of known problem parameters, we present an algorithm that assumes state and action information from only neighboring agents according to the graph describing the dynamic or cost coupling among the agents. We show that the proposed algorithm converges to an $\epsilon$-neighborhood of the NE where the value of $\epsilon$ depends on the size of the local neighborhood of agents.}
}

Endnote

%0 Conference Paper
%T Scalable Natural Policy Gradient for General-Sum Linear Quadratic Games with Known Parameters
%A Mostafa Shibl
%A Wesley Suttle
%A Vijay Gupta
%B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference
%C Proceedings of Machine Learning Research
%D 2025
%E Necmiye Ozay
%E Laura Balzano
%E Dimitra Panagou
%E Alessandro Abate	
%F pmlr-v283-shibl25a
%I PMLR
%P 139--152
%U https://proceedings.mlr.press/v283/shibl25a.html
%V 283
%X Consider a general-sum N-player linear-quadratic (LQ) game with stochastic dynamics over a finite time horizon. It is known that under some mild assumptions, the Nash equilibrium (NE) strategies for the players can be obtained by a natural policy gradient algorithm. However, the traditional implementation of the algorithm requires the availability of complete state and action information from all agents and may not scale well with the number of agents. Under the assumption of known problem parameters, we present an algorithm that assumes state and action information from only neighboring agents according to the graph describing the dynamic or cost coupling among the agents. We show that the proposed algorithm converges to an $\epsilon$-neighborhood of the NE where the value of $\epsilon$ depends on the size of the local neighborhood of agents.

APA

Shibl, M., Suttle, W. & Gupta, V.. (2025). Scalable Natural Policy Gradient for General-Sum Linear Quadratic Games with Known Parameters. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:139-152 Available from https://proceedings.mlr.press/v283/shibl25a.html.

Related Material

Download PDF