[edit]
Multi-Agent Reinforcement Learning with Reward Delays
Proceedings of The 5th Annual Learning for Dynamics and Control Conference, PMLR 211:692-704, 2023.
Abstract
This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate ˜O(H3√STKK+H3√SA√K) where K is the number of episodes, H is the planning horizon, S is the size of the state space, A is the size of the largest action space, and TK is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.