Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own

Weirui Ye; Yunsheng Zhang; Haoyang Weng; Xianfan Gu; Shengjie Wang; Tong Zhang; Mengchen Wang; Pieter Abbeel; Yang Gao

Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own

Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

Proceedings of The 8th Conference on Robot Learning, PMLR 270:185-208, 2025.

Abstract

Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-ye25a,
  title = 	 {Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own},
  author =       {Ye, Weirui and Zhang, Yunsheng and Weng, Haoyang and Gu, Xianfan and Wang, Shengjie and Zhang, Tong and Wang, Mengchen and Abbeel, Pieter and Gao, Yang},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {185--208},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/ye25a/ye25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/ye25a.html},
  abstract = 	 {Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks.}
}

Endnote

%0 Conference Paper
%T Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own
%A Weirui Ye
%A Yunsheng Zhang
%A Haoyang Weng
%A Xianfan Gu
%A Shengjie Wang
%A Tong Zhang
%A Mengchen Wang
%A Pieter Abbeel
%A Yang Gao
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-ye25a
%I PMLR
%P 185--208
%U https://proceedings.mlr.press/v270/ye25a.html
%V 270
%X Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks.

APA

Ye, W., Zhang, Y., Weng, H., Gu, X., Wang, S., Zhang, T., Wang, M., Abbeel, P. & Gao, Y.. (2025). Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:185-208 Available from https://proceedings.mlr.press/v270/ye25a.html.

Reinforcement Learning with Foundation Priors: Let Embodied Agent Efficiently Learn on Its Own

Abstract

Cite this Paper

Related Material