FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu; Haichao Zhang; Di Wu; Wei Xu; Benoit Boulet

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14256-14274, 2024.

Abstract

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-fu24j,
  title = 	 {{F}u{RL}: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning},
  author =       {Fu, Yuwei and Zhang, Haichao and Wu, Di and Xu, Wei and Boulet, Benoit},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {14256--14274},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/fu24j/fu24j.pdf},
  url = 	 {https://proceedings.mlr.press/v235/fu24j.html},
  abstract = 	 {In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.}
}

Endnote

%0 Conference Paper
%T FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
%A Yuwei Fu
%A Haichao Zhang
%A Di Wu
%A Wei Xu
%A Benoit Boulet
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-fu24j
%I PMLR
%P 14256--14274
%U https://proceedings.mlr.press/v235/fu24j.html
%V 235
%X In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

APA


Fu, Y., Zhang, H., Wu, D., Xu, W. & Boulet, B.. (2024). FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14256-14274 Available from https://proceedings.mlr.press/v235/fu24j.html.

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Abstract

Cite this Paper

Related Material