MANSA: Learning Fast and Slow in Multi-Agent Systems

David Henry Mguni; Haojun Chen; Taher Jafferjee; Jianhong Wang; Longfei Yue; Xidong Feng; Stephen Marcus Mcaleer; Feifei Tong; Jun Wang; Yaodong Yang

MANSA: Learning Fast and Slow in Multi-Agent Systems

David Henry Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Longfei Yue, Xidong Feng, Stephen Marcus Mcaleer, Feifei Tong, Jun Wang, Yaodong Yang

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24631-24658, 2023.

Abstract

In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate their behaviour but employing CL everywhere is often prohibitively expensive in real-world applications. Besides, using CL in value-based methods often needs strong representational constraints (e.g. individual-global-max condition) that can lead to poor performance if violated. In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. At its core, MANSA has an additional agent that uses switching controls to quickly learn the best states to activate CL during training, using CL only where necessary and vastly reducing the computational burden of CL. Our theory proves MANSA preserves cooperative MARL convergence properties, boosts IL performance and can optimally make use of a fixed budget on the number CL calls. We show empirically in Level-based Foraging (LBF) and StarCraft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-mguni23a,
  title = 	 {{MANSA}: Learning Fast and Slow in Multi-Agent Systems},
  author =       {Mguni, David Henry and Chen, Haojun and Jafferjee, Taher and Wang, Jianhong and Yue, Longfei and Feng, Xidong and Mcaleer, Stephen Marcus and Tong, Feifei and Wang, Jun and Yang, Yaodong},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {24631--24658},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/mguni23a/mguni23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/mguni23a.html},
  abstract = 	 {In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate their behaviour but employing CL everywhere is often prohibitively expensive in real-world applications. Besides, using CL in value-based methods often needs strong representational constraints (e.g. individual-global-max condition) that can lead to poor performance if violated. In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. At its core, MANSA has an additional agent that uses switching controls to quickly learn the best states to activate CL during training, using CL only where necessary and vastly reducing the computational burden of CL. Our theory proves MANSA preserves cooperative MARL convergence properties, boosts IL performance and can optimally make use of a fixed budget on the number CL calls. We show empirically in Level-based Foraging (LBF) and StarCraft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF.}
}

Endnote

%0 Conference Paper
%T MANSA: Learning Fast and Slow in Multi-Agent Systems
%A David Henry Mguni
%A Haojun Chen
%A Taher Jafferjee
%A Jianhong Wang
%A Longfei Yue
%A Xidong Feng
%A Stephen Marcus Mcaleer
%A Feifei Tong
%A Jun Wang
%A Yaodong Yang
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-mguni23a
%I PMLR
%P 24631--24658
%U https://proceedings.mlr.press/v202/mguni23a.html
%V 202
%X In multi-agent reinforcement learning (MARL), independent learning (IL) often shows remarkable performance and easily scales with the number of agents. Yet, using IL can be inefficient and runs the risk of failing to successfully train, particularly in scenarios that require agents to coordinate their actions. Using centralised learning (CL) enables MARL agents to quickly learn how to coordinate their behaviour but employing CL everywhere is often prohibitively expensive in real-world applications. Besides, using CL in value-based methods often needs strong representational constraints (e.g. individual-global-max condition) that can lead to poor performance if violated. In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. At its core, MANSA has an additional agent that uses switching controls to quickly learn the best states to activate CL during training, using CL only where necessary and vastly reducing the computational burden of CL. Our theory proves MANSA preserves cooperative MARL convergence properties, boosts IL performance and can optimally make use of a fixed budget on the number CL calls. We show empirically in Level-based Foraging (LBF) and StarCraft Multi-agent Challenge (SMAC) that MANSA achieves fast, superior and more reliable performance while making 40% fewer CL calls in SMAC and using CL at only 1% CL calls in LBF.

APA


Mguni, D.H., Chen, H., Jafferjee, T., Wang, J., Yue, L., Feng, X., Mcaleer, S.M., Tong, F., Wang, J. & Yang, Y.. (2023). MANSA: Learning Fast and Slow in Multi-Agent Systems. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:24631-24658 Available from https://proceedings.mlr.press/v202/mguni23a.html.

MANSA: Learning Fast and Slow in Multi-Agent Systems

Abstract

Cite this Paper

Related Material