Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Ziyi Chen, Yi Zhou, Rong-Rong Chen, Shaofeng Zou
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:3794-3834, 2022.

Abstract

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities O(ϵ2lnϵ1) and O(ϵ3lnϵ1), respectively, and achieve an improved communication complexity O(ϵ1lnϵ1). Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-chen22ah, title = {Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis}, author = {Chen, Ziyi and Zhou, Yi and Chen, Rong-Rong and Zou, Shaofeng}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {3794--3834}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/chen22ah/chen22ah.pdf}, url = {https://proceedings.mlr.press/v162/chen22ah.html}, abstract = {Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$, respectively, and achieve an improved communication complexity $\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.} }
Endnote
%0 Conference Paper %T Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis %A Ziyi Chen %A Yi Zhou %A Rong-Rong Chen %A Shaofeng Zou %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-chen22ah %I PMLR %P 3794--3834 %U https://proceedings.mlr.press/v162/chen22ah.html %V 162 %X Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$, respectively, and achieve an improved communication complexity $\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.
APA
Chen, Z., Zhou, Y., Chen, R. & Zou, S.. (2022). Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:3794-3834 Available from https://proceedings.mlr.press/v162/chen22ah.html.

Related Material