Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10772-10782, 2021.

Abstract

Recent years have witnessed the success of multi-agent reinforcement learning, which has motivated new research directions for mean-field control (MFC) and mean-field game (MFG), as the multi-agent system can be well approximated by a mean-field problem when the number of agents grows to be very large. In this paper, we study the policy gradient (PG) method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on a continuous-time model where some of our analyzing techniques could be valuable to the interested readers. For both the MFC and the MFG, we provide PG update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For the MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-wang21j, title = {Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time}, author = {Wang, Weichen and Han, Jiequn and Yang, Zhuoran and Wang, Zhaoran}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {10772--10782}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/wang21j/wang21j.pdf}, url = {https://proceedings.mlr.press/v139/wang21j.html}, abstract = {Recent years have witnessed the success of multi-agent reinforcement learning, which has motivated new research directions for mean-field control (MFC) and mean-field game (MFG), as the multi-agent system can be well approximated by a mean-field problem when the number of agents grows to be very large. In this paper, we study the policy gradient (PG) method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on a continuous-time model where some of our analyzing techniques could be valuable to the interested readers. For both the MFC and the MFG, we provide PG update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For the MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.} }
Endnote
%0 Conference Paper %T Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time %A Weichen Wang %A Jiequn Han %A Zhuoran Yang %A Zhaoran Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-wang21j %I PMLR %P 10772--10782 %U https://proceedings.mlr.press/v139/wang21j.html %V 139 %X Recent years have witnessed the success of multi-agent reinforcement learning, which has motivated new research directions for mean-field control (MFC) and mean-field game (MFG), as the multi-agent system can be well approximated by a mean-field problem when the number of agents grows to be very large. In this paper, we study the policy gradient (PG) method for the linear-quadratic mean-field control and game, where we assume each agent has identical linear state transitions and quadratic cost functions. While most recent works on policy gradient for MFC and MFG are based on discrete-time models, we focus on a continuous-time model where some of our analyzing techniques could be valuable to the interested readers. For both the MFC and the MFG, we provide PG update and show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation. For the MFG, we also provide sufficient conditions for the existence and uniqueness of the Nash equilibrium.
APA
Wang, W., Han, J., Yang, Z. & Wang, Z.. (2021). Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10772-10782 Available from https://proceedings.mlr.press/v139/wang21j.html.

Related Material