FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14256-14274, 2024.

Abstract

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-fu24j, title = {{F}u{RL}: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning}, author = {Fu, Yuwei and Zhang, Haichao and Wu, Di and Xu, Wei and Boulet, Benoit}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {14256--14274}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fu24j/fu24j.pdf}, url = {https://proceedings.mlr.press/v235/fu24j.html}, abstract = {In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.} }
Endnote
%0 Conference Paper %T FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning %A Yuwei Fu %A Haichao Zhang %A Di Wu %A Wei Xu %A Benoit Boulet %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-fu24j %I PMLR %P 14256--14274 %U https://proceedings.mlr.press/v235/fu24j.html %V 235 %X In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.
APA
Fu, Y., Zhang, H., Wu, D., Xu, W. & Boulet, B.. (2024). FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14256-14274 Available from https://proceedings.mlr.press/v235/fu24j.html.

Related Material