Multi-Agent Reinforcement Learning with Reward Delays

Yuyang Zhang; Runyu Zhang; Yuantao Gu; Na Li

Multi-Agent Reinforcement Learning with Reward Delays

Yuyang Zhang, Runyu Zhang, Yuantao Gu, Na Li

Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:692-704, 2023.

Abstract

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate

$\tilde{\mathcal{O}}(\frac{H^3\sqrt{S\mathcal{T}_K}}{K}+\frac{H^3\sqrt{SA}}{\sqrt{K}})$ where

$K$ is the number of episodes,

$H$ is the planning horizon,

$S$ is the size of the state space,

$A$ is the size of the largest action space, and

$\mathcal{T}_K$ is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

Cite this Paper

BibTeX


@InProceedings{pmlr-v211-zhang23c,
  title = 	 {Multi-Agent Reinforcement Learning with Reward Delays},
  author =       {Zhang, Yuyang and Zhang, Runyu and Gu, Yuantao and Li, Na},
  booktitle = 	 {Proceedings of The 5th Annual Learning for Dynamics and Control Conference},
  pages = 	 {692--704},
  year = 	 {2023},
  editor = 	 {Matni, Nikolai and Morari, Manfred and Pappas, George J.},
  volume = 	 {211},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--16 Jun},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v211/zhang23c/zhang23c.pdf},
  url = 	 {https://proceedings.mlr.press/v211/zhang23c.html},
  abstract = 	 {This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate $\tilde{\mathcal{O}}(\frac{H^3\sqrt{S\mathcal{T}_K}}{K}+\frac{H^3\sqrt{SA}}{\sqrt{K}})$ where $K$ is the number of episodes, $H$ is the planning horizon, $S$ is the size of the state space, $A$ is the size of the largest action space, and $\mathcal{T}_K$ is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case. }
}

Endnote

%0 Conference Paper
%T Multi-Agent Reinforcement Learning with Reward Delays
%A Yuyang Zhang
%A Runyu Zhang
%A Yuantao Gu
%A Na Li
%B Proceedings of The 5th Annual Learning for Dynamics and Control Conference
%C Proceedings of Machine Learning Research
%D 2023
%E Nikolai Matni
%E Manfred Morari
%E George J. Pappas	
%F pmlr-v211-zhang23c
%I PMLR
%P 692--704
%U https://proceedings.mlr.press/v211/zhang23c.html
%V 211
%X This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate $\tilde{\mathcal{O}}(\frac{H^3\sqrt{S\mathcal{T}_K}}{K}+\frac{H^3\sqrt{SA}}{\sqrt{K}})$ where $K$ is the number of episodes, $H$ is the planning horizon, $S$ is the size of the state space, $A$ is the size of the largest action space, and $\mathcal{T}_K$ is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

APA


Zhang, Y., Zhang, R., Gu, Y. & Li, N.. (2023). Multi-Agent Reinforcement Learning with Reward Delays. Proceedings of The 5th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 211:692-704 Available from https://proceedings.mlr.press/v211/zhang23c.html.

Multi-Agent Reinforcement Learning with Reward Delays

Abstract

Cite this Paper

Related Material