Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games

Hongyi Guo, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3899-3909, 2021.

Abstract

We study the global convergence and global optimality of the actor-critic algorithm applied for the zero-sum two-player stochastic games in a decentralized manner. We focus on the single-timescale setting where the critic is updated by applying the Bellman operator only once and the actor is updated by policy gradient with the information from the critic. Our algorithm is in a decentralized manner, as we assume that each player has no access to the actions of the other one, which, in a way, protects the privacy of both players. Moreover, we consider linear function approximations for both actor and critic, and we prove that the sequence of joint policy generated by our decentralized linear algorithm converges to the minimax equilibrium at a sublinear rate \(\cO(\sqrt{K})\), where \(K\){is} the number of iterations. To the best of our knowledge, we establish the global optimality and convergence of decentralized actor-critic algorithm on zero-sum two-player stochastic games with linear function approximations for the first time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-guo21a, title = {Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games}, author = {Guo, Hongyi and Fu, Zuyue and Yang, Zhuoran and Wang, Zhaoran}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {3899--3909}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/guo21a/guo21a.pdf}, url = {https://proceedings.mlr.press/v139/guo21a.html}, abstract = {We study the global convergence and global optimality of the actor-critic algorithm applied for the zero-sum two-player stochastic games in a decentralized manner. We focus on the single-timescale setting where the critic is updated by applying the Bellman operator only once and the actor is updated by policy gradient with the information from the critic. Our algorithm is in a decentralized manner, as we assume that each player has no access to the actions of the other one, which, in a way, protects the privacy of both players. Moreover, we consider linear function approximations for both actor and critic, and we prove that the sequence of joint policy generated by our decentralized linear algorithm converges to the minimax equilibrium at a sublinear rate \(\cO(\sqrt{K})\), where \(K\){is} the number of iterations. To the best of our knowledge, we establish the global optimality and convergence of decentralized actor-critic algorithm on zero-sum two-player stochastic games with linear function approximations for the first time.} }
Endnote
%0 Conference Paper %T Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games %A Hongyi Guo %A Zuyue Fu %A Zhuoran Yang %A Zhaoran Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-guo21a %I PMLR %P 3899--3909 %U https://proceedings.mlr.press/v139/guo21a.html %V 139 %X We study the global convergence and global optimality of the actor-critic algorithm applied for the zero-sum two-player stochastic games in a decentralized manner. We focus on the single-timescale setting where the critic is updated by applying the Bellman operator only once and the actor is updated by policy gradient with the information from the critic. Our algorithm is in a decentralized manner, as we assume that each player has no access to the actions of the other one, which, in a way, protects the privacy of both players. Moreover, we consider linear function approximations for both actor and critic, and we prove that the sequence of joint policy generated by our decentralized linear algorithm converges to the minimax equilibrium at a sublinear rate \(\cO(\sqrt{K})\), where \(K\){is} the number of iterations. To the best of our knowledge, we establish the global optimality and convergence of decentralized actor-critic algorithm on zero-sum two-player stochastic games with linear function approximations for the first time.
APA
Guo, H., Fu, Z., Yang, Z. & Wang, Z.. (2021). Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3899-3909 Available from https://proceedings.mlr.press/v139/guo21a.html.

Related Material