[edit]

# Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

*Proceedings of Thirty Sixth Conference on Learning Theory*, PMLR 195:2651-2652, 2023.

#### Abstract

We propose a new model, \emph{independent linear Markov game}, for multi-agent reinforcement learning with a large state space and a large number of agents.This is a class of Markov games with \emph{independent} linear function approximation, where each agent has its own function approximation for the state-action value functions that are {\it marginalized} by other players’ policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with \emph{each agent’s own function class complexity}, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle {\it non-stationarity} incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games, with applications in learning in congestion games.In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in \citep{daskalakis2022complexity}, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters. Furthermore, we design the first provably efficient algorithm for learning Markov CE that breaks the curse of multiagents.