Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:49368-49387, 2024.

Abstract

Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-venuto24a, title = {Code as Reward: Empowering Reinforcement Learning with {VLM}s}, author = {Venuto, David and Islam, Mohammad Sami Nur and Klissarov, Martin and Precup, Doina and Yang, Sherry and Anand, Ankit}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {49368--49387}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/venuto24a/venuto24a.pdf}, url = {https://proceedings.mlr.press/v235/venuto24a.html}, abstract = {Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.} }
Endnote
%0 Conference Paper %T Code as Reward: Empowering Reinforcement Learning with VLMs %A David Venuto %A Mohammad Sami Nur Islam %A Martin Klissarov %A Doina Precup %A Sherry Yang %A Ankit Anand %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-venuto24a %I PMLR %P 49368--49387 %U https://proceedings.mlr.press/v235/venuto24a.html %V 235 %X Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.
APA
Venuto, D., Islam, M.S.N., Klissarov, M., Precup, D., Yang, S. & Anand, A.. (2024). Code as Reward: Empowering Reinforcement Learning with VLMs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:49368-49387 Available from https://proceedings.mlr.press/v235/venuto24a.html.

Related Material