Continuous Coordination As a Realistic Scenario for Lifelong Learning

Hadi Nekoei; Akilesh Badrinaaraayanan; Aaron Courville; Sarath Chandar

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron Courville, Sarath Chandar

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8016-8024, 2021.

Abstract

Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents’ policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi {—} a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-nekoei21a,
  title = 	 {Continuous Coordination As a Realistic Scenario for Lifelong Learning},
  author =       {Nekoei, Hadi and Badrinaaraayanan, Akilesh and Courville, Aaron and Chandar, Sarath},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8016--8024},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/nekoei21a/nekoei21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/nekoei21a.html},
  abstract = 	 {Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents’ policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi {—} a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.}
}

Endnote

%0 Conference Paper
%T Continuous Coordination As a Realistic Scenario for Lifelong Learning
%A Hadi Nekoei
%A Akilesh Badrinaaraayanan
%A Aaron Courville
%A Sarath Chandar
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-nekoei21a
%I PMLR
%P 8016--8024
%U https://proceedings.mlr.press/v139/nekoei21a.html
%V 139
%X Current deep reinforcement learning (RL) algorithms are still highly task-specific and lack the ability to generalize to new environments. Lifelong learning (LLL), however, aims at solving multiple tasks sequentially by efficiently transferring and using knowledge between tasks. Despite a surge of interest in lifelong RL in recent years, the lack of a realistic testbed makes robust evaluation of LLL algorithms difficult. Multi-agent RL (MARL), on the other hand, can be seen as a natural scenario for lifelong RL due to its inherent non-stationarity, since the agents’ policies change over time. In this work, we introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. Our setup is based on Hanabi {—} a partially-observable, fully cooperative multi-agent game that has been shown to be challenging for zero-shot coordination. Its large strategy space makes it a desirable environment for lifelong RL tasks. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation regimes to shed light on their strengths and weaknesses. This continual learning paradigm also provides us with a pragmatic way of going beyond centralized training which is the most commonly used training protocol in MARL. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works. The code and all pre-trained models are available at https://github.com/chandar-lab/Lifelong-Hanabi.

APA

Nekoei, H., Badrinaaraayanan, A., Courville, A. & Chandar, S.. (2021). Continuous Coordination As a Realistic Scenario for Lifelong Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8016-8024 Available from https://proceedings.mlr.press/v139/nekoei21a.html.

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Abstract

Cite this Paper

Related Material