Guiding Pretraining in Reinforcement Learning with Large Language Models

Yuqing Du; Olivia Watkins; Zihan Wang; Cédric Colas; Trevor Darrell; Pieter Abbeel; Abhishek Gupta; Jacob Andreas

Guiding Pretraining in Reinforcement Learning with Large Language Models

Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:8657-8677, 2023.

Abstract

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-du23f,
  title = 	 {Guiding Pretraining in Reinforcement Learning with Large Language Models},
  author =       {Du, Yuqing and Watkins, Olivia and Wang, Zihan and Colas, C\'{e}dric and Darrell, Trevor and Abbeel, Pieter and Gupta, Abhishek and Andreas, Jacob},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {8657--8677},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/du23f/du23f.pdf},
  url = 	 {https://proceedings.mlr.press/v202/du23f.html},
  abstract = 	 {Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.}
}

Endnote

%0 Conference Paper
%T Guiding Pretraining in Reinforcement Learning with Large Language Models
%A Yuqing Du
%A Olivia Watkins
%A Zihan Wang
%A Cédric Colas
%A Trevor Darrell
%A Pieter Abbeel
%A Abhishek Gupta
%A Jacob Andreas
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-du23f
%I PMLR
%P 8657--8677
%U https://proceedings.mlr.press/v202/du23f.html
%V 202
%X Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.

APA


Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T., Abbeel, P., Gupta, A. & Andreas, J.. (2023). Guiding Pretraining in Reinforcement Learning with Large Language Models. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:8657-8677 Available from https://proceedings.mlr.press/v202/du23f.html.

Guiding Pretraining in Reinforcement Learning with Large Language Models

Abstract

Cite this Paper

Related Material