Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Omayma Mahjoub; Sasha Abramowitz; Ruan John De Kock; Wiem Khlifi; Simon Verster Du Toit; Jemma Daniel; Louay Ben Nessir; Louise Beyers; Juan Claude Formanek; Liam Clark; Arnu Pretorius

Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Omayma Mahjoub, Sasha Abramowitz, Ruan John De Kock, Wiem Khlifi, Simon Verster Du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Juan Claude Formanek, Liam Clark, Arnu Pretorius

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:42579-42614, 2025.

Abstract

As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modelling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable’s performance gains and confirm its efficient computational memory usage. All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website: https://sites.google.com/view/sable-marl. An improved and maintained version of Sable is available in Mava: https://github.com/instadeepai/Mava.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-mahjoub25a,
  title = 	 {Sable: a Performant, Efficient and Scalable Sequence Model for {MARL}},
  author =       {Mahjoub, Omayma and Abramowitz, Sasha and De Kock, Ruan John and Khlifi, Wiem and Toit, Simon Verster Du and Daniel, Jemma and Nessir, Louay Ben and Beyers, Louise and Formanek, Juan Claude and Clark, Liam and Pretorius, Arnu},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {42579--42614},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/mahjoub25a/mahjoub25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/mahjoub25a.html},
  abstract = 	 {As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modelling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable’s performance gains and confirm its efficient computational memory usage. All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website: https://sites.google.com/view/sable-marl. An improved and maintained version of Sable is available in Mava: https://github.com/instadeepai/Mava.}
}

Endnote

%0 Conference Paper
%T Sable: a Performant, Efficient and Scalable Sequence Model for MARL
%A Omayma Mahjoub
%A Sasha Abramowitz
%A Ruan John De Kock
%A Wiem Khlifi
%A Simon Verster Du Toit
%A Jemma Daniel
%A Louay Ben Nessir
%A Louise Beyers
%A Juan Claude Formanek
%A Liam Clark
%A Arnu Pretorius
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-mahjoub25a
%I PMLR
%P 42579--42614
%U https://proceedings.mlr.press/v267/mahjoub25a.html
%V 267
%X As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modelling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable’s performance gains and confirm its efficient computational memory usage. All experimental data, hyperparameters, and code for a frozen version of Sable used in this paper are available on our website: https://sites.google.com/view/sable-marl. An improved and maintained version of Sable is available in Mava: https://github.com/instadeepai/Mava.

APA

Mahjoub, O., Abramowitz, S., De Kock, R.J., Khlifi, W., Toit, S.V.D., Daniel, J., Nessir, L.B., Beyers, L., Formanek, J.C., Clark, L. & Pretorius, A.. (2025). Sable: a Performant, Efficient and Scalable Sequence Model for MARL. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:42579-42614 Available from https://proceedings.mlr.press/v267/mahjoub25a.html.

Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Abstract

Cite this Paper

Related Material