Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Thomas Jackson, Samuel Coward, Jakob Nicolaus Foerster
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35104-35137, 2024.

Abstract

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We therefore believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-matthews24a, title = {Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning}, author = {Matthews, Michael and Beukman, Michael and Ellis, Benjamin and Samvelyan, Mikayel and Jackson, Matthew Thomas and Coward, Samuel and Foerster, Jakob Nicolaus}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {35104--35137}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/matthews24a/matthews24a.pdf}, url = {https://proceedings.mlr.press/v235/matthews24a.html}, abstract = {Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We therefore believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.} }
Endnote
%0 Conference Paper %T Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning %A Michael Matthews %A Michael Beukman %A Benjamin Ellis %A Mikayel Samvelyan %A Matthew Thomas Jackson %A Samuel Coward %A Jakob Nicolaus Foerster %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-matthews24a %I PMLR %P 35104--35137 %U https://proceedings.mlr.press/v235/matthews24a.html %V 235 %X Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms. We identify that existing benchmarks used for research into open-ended learning fall into one of two categories. Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen. To remedy this, we first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from NetHack. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark. We therefore believe that Craftax can for the first time allow researchers to experiment in a complex, open-ended environment with limited computational resources.
APA
Matthews, M., Beukman, M., Ellis, B., Samvelyan, M., Jackson, M.T., Coward, S. & Foerster, J.N.. (2024). Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:35104-35137 Available from https://proceedings.mlr.press/v235/matthews24a.html.

Related Material