Mix & Match Agent Curricula for Reinforcement Learning

Wojciech Czarnecki; Siddhant Jayakumar; Max Jaderberg; Leonard Hasenclever; Yee Whye Teh; Nicolas Heess; Simon Osindero; Razvan Pascanu

Mix & Match Agent Curricula for Reinforcement Learning

Wojciech Czarnecki, Siddhant Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Nicolas Heess, Simon Osindero, Razvan Pascanu

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1087-1095, 2018.

Abstract

We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise.The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents.In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally.We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.

Cite this Paper

BibTeX


@InProceedings{pmlr-v80-czarnecki18a,
  title = 	 {Mix & Match Agent Curricula for Reinforcement Learning},
  author =       {Czarnecki, Wojciech and Jayakumar, Siddhant and Jaderberg, Max and Hasenclever, Leonard and Teh, Yee Whye and Heess, Nicolas and Osindero, Simon and Pascanu, Razvan},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {1087--1095},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/czarnecki18a/czarnecki18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/czarnecki18a.html},
  abstract = 	 {We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise.The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents.In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally.We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.}
}

Endnote

%0 Conference Paper
%T Mix & Match Agent Curricula for Reinforcement Learning
%A Wojciech Czarnecki
%A Siddhant Jayakumar
%A Max Jaderberg
%A Leonard Hasenclever
%A Yee Whye Teh
%A Nicolas Heess
%A Simon Osindero
%A Razvan Pascanu
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-czarnecki18a
%I PMLR
%P 1087--1095
%U https://proceedings.mlr.press/v80/czarnecki18a.html
%V 80
%X We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise.The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents.In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally.We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.

APA


Czarnecki, W., Jayakumar, S., Jaderberg, M., Hasenclever, L., Teh, Y.W., Heess, N., Osindero, S. & Pascanu, R.. (2018). Mix & Match Agent Curricula for Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1087-1095 Available from https://proceedings.mlr.press/v80/czarnecki18a.html.

Mix & Match Agent Curricula for Reinforcement Learning

Abstract

Cite this Paper

Related Material