Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft

Christian Scheller, Yanick Schraner, Manfred Vogel
Proceedings of the NeurIPS 2019 Competition and Demonstration Track, PMLR 123:67-76, 2020.

Abstract

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v123-scheller20a, title = {Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft}, author = {Scheller, Christian and Schraner, Yanick and Vogel, Manfred}, booktitle = {Proceedings of the NeurIPS 2019 Competition and Demonstration Track}, pages = {67--76}, year = {2020}, editor = {Escalante, Hugo Jair and Hadsell, Raia}, volume = {123}, series = {Proceedings of Machine Learning Research}, month = {08--14 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v123/scheller20a/scheller20a.pdf}, url = {https://proceedings.mlr.press/v123/scheller20a.html}, abstract = { Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.} }
Endnote
%0 Conference Paper %T Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft %A Christian Scheller %A Yanick Schraner %A Manfred Vogel %B Proceedings of the NeurIPS 2019 Competition and Demonstration Track %C Proceedings of Machine Learning Research %D 2020 %E Hugo Jair Escalante %E Raia Hadsell %F pmlr-v123-scheller20a %I PMLR %P 67--76 %U https://proceedings.mlr.press/v123/scheller20a.html %V 123 %X Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.
APA
Scheller, C., Schraner, Y. & Vogel, M.. (2020). Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft. Proceedings of the NeurIPS 2019 Competition and Demonstration Track, in Proceedings of Machine Learning Research 123:67-76 Available from https://proceedings.mlr.press/v123/scheller20a.html.

Related Material