Prioritized Level Replay

Minqi Jiang; Edward Grefenstette; Tim Rocktäschel

Prioritized Level Replay

Minqi Jiang, Edward Grefenstette, Tim Rocktäschel

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:4940-4950, 2021.

Abstract

Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-jiang21b,
  title = 	 {Prioritized Level Replay},
  author =       {Jiang, Minqi and Grefenstette, Edward and Rockt{\"a}schel, Tim},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {4940--4950},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/jiang21b/jiang21b.pdf},
  url = 	 {https://proceedings.mlr.press/v139/jiang21b.html},
  abstract = 	 {Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.}
}

Endnote

%0 Conference Paper
%T Prioritized Level Replay
%A Minqi Jiang
%A Edward Grefenstette
%A Tim Rocktäschel
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-jiang21b
%I PMLR
%P 4940--4950
%U https://proceedings.mlr.press/v139/jiang21b.html
%V 139
%X Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.

APA

Jiang, M., Grefenstette, E. & Rocktäschel, T.. (2021). Prioritized Level Replay. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:4940-4950 Available from https://proceedings.mlr.press/v139/jiang21b.html.

Prioritized Level Replay

Abstract

Cite this Paper

Related Material