Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)

Yan Dai, Qiwen Cui, Simon S. Du
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1260-1261, 2024.

Abstract

Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-dai24a, title = {Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)}, author = {Dai, Yan and Cui, Qiwen and Du, Simon S.}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1260--1261}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/dai24a/dai24a.pdf}, url = {https://proceedings.mlr.press/v247/dai24a.html}, abstract = {Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.} }
Endnote
%0 Conference Paper %T Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract) %A Yan Dai %A Qiwen Cui %A Simon S. Du %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-dai24a %I PMLR %P 1260--1261 %U https://proceedings.mlr.press/v247/dai24a.html %V 247 %X Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.
APA
Dai, Y., Cui, Q. & Du, S.S.. (2024). Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract). Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1260-1261 Available from https://proceedings.mlr.press/v247/dai24a.html.

Related Material