Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)

Yan Dai; Qiwen Cui; Simon S. Du

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)

Yan Dai, Qiwen Cui, Simon S. Du

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1260-1261, 2024.

Abstract

Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of

$O(T^{-1/4})$ or brought a polynomial dependency on the number of actions

$A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal

$O(T^{-1/2})$ convergence rate, and avoids

$\text{poly}(A_{\max})$ dependency simultaneously.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-dai24a,
  title = 	 {Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)},
  author =       {Dai, Yan and Cui, Qiwen and Du, Simon S.},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {1260--1261},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/dai24a/dai24a.pdf},
  url = 	 {https://proceedings.mlr.press/v247/dai24a.html},
  abstract = 	 {Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.}
}

Endnote

%0 Conference Paper
%T Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract)
%A Yan Dai
%A Qiwen Cui
%A Simon S. Du
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-dai24a
%I PMLR
%P 1260--1261
%U https://proceedings.mlr.press/v247/dai24a.html
%V 247
%X Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the “curse of multi-agents” (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ – which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing \textit{data-dependent} (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel \textit{action-dependent bonuses} to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.

APA


Dai, Y., Cui, Q. & Du, S.S.. (2024). Refined Sample Complexity for Markov Games with Independent Linear Function Approximation (Extended Abstract). Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1260-1261 Available from https://proceedings.mlr.press/v247/dai24a.html.

Related Material

Download PDF