Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Ziyi Chen; Yi Zhou; Rong-Rong Chen; Shaofeng Zou

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Ziyi Chen, Yi Zhou, Rong-Rong Chen, Shaofeng Zou

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:3794-3834, 2022.

Abstract

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities

$\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and

$\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ , respectively, and achieve an improved communication complexity

$\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$ . Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-chen22ah,
  title = 	 {Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis},
  author =       {Chen, Ziyi and Zhou, Yi and Chen, Rong-Rong and Zou, Shaofeng},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {3794--3834},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/chen22ah/chen22ah.pdf},
  url = 	 {https://proceedings.mlr.press/v162/chen22ah.html},
  abstract = 	 {Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$, respectively, and achieve an improved communication complexity $\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.}
}

Endnote

%0 Conference Paper
%T Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis
%A Ziyi Chen
%A Yi Zhou
%A Rong-Rong Chen
%A Shaofeng Zou
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-chen22ah
%I PMLR
%P 3794--3834
%U https://proceedings.mlr.press/v162/chen22ah.html
%V 162
%X Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$, respectively, and achieve an improved communication complexity $\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.

APA


Chen, Z., Zhou, Y., Chen, R. & Zou, S.. (2022). Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:3794-3834 Available from https://proceedings.mlr.press/v162/chen22ah.html.

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Abstract

Cite this Paper

Related Material