RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning

Liam Boyle; Nicolas Baumann; Paviththiren Sivasothilingam; Michele Magno; Luca Benini

RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning

Liam Boyle, Nicolas Baumann, Paviththiren Sivasothilingam, Michele Magno, Luca Benini

Proceedings of The 9th Conference on Robot Learning, PMLR 305:4074-4092, 2025.

Abstract

Future robotic systems operating in real-world environments require on-board embodied intelligence without continuous cloud connection, balancing capabilities with constraints on computational power and memory. This work presents an extension of the R1-zero approach, which enables the usage of small parameter-count Large Language Models (LLMs) in the robotic domain. The R1-Zero approach was originally developed to enable mathematical reasoning in LLMs using static datasets. We extend it to the robotics domain through integration with a closed-loop Reinforcement Learning (RL) framework. This extension allows reasoning in Embodied Artificial Intelligence (EmbodiedAI) settings without relying solely on distillation of large models through Supervised Fine-Tuning (SFT). We show that small-scale LLMs can achieve effective reasoning performance by learning through closed-loop interaction with their environment, which enables tasks that previously required significantly larger models. A performance gain of 20.2% points over the SFT-based baseline is observed with a Qwen2.5-1.5B model. Using the proposed training procedure, Qwen2.5-3B achieves a 63.3% control adaptability score, surpassing the 58.5% obtained by the much larger, cloud-bound GPT-4o. These results highlight that practical, on-board deployment of small LLMs is not only feasible but can outperform larger models when trained through environmental interaction, underscoring the importance of an interactive, embodied learning framework for robotic EmbodiedAI — one grounded in practical experience rather than static supervision.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-boyle25a,
  title = 	 {RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning},
  author =       {Boyle, Liam and Baumann, Nicolas and Sivasothilingam, Paviththiren and Magno, Michele and Benini, Luca},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {4074--4092},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/boyle25a/boyle25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/boyle25a.html},
  abstract = 	 {Future robotic systems operating in real-world environments require on-board embodied intelligence without continuous cloud connection, balancing capabilities with constraints on computational power and memory. This work presents an extension of the R1-zero approach, which enables the usage of small parameter-count Large Language Models (LLMs) in the robotic domain. The R1-Zero approach was originally developed to enable mathematical reasoning in LLMs using static datasets. We extend it to the robotics domain through integration with a closed-loop Reinforcement Learning (RL) framework. This extension allows reasoning in Embodied Artificial Intelligence (EmbodiedAI) settings without relying solely on distillation of large models through Supervised Fine-Tuning (SFT). We show that small-scale LLMs can achieve effective reasoning performance by learning through closed-loop interaction with their environment, which enables tasks that previously required significantly larger models. A performance gain of 20.2% points over the SFT-based baseline is observed with a Qwen2.5-1.5B model. Using the proposed training procedure, Qwen2.5-3B achieves a 63.3% control adaptability score, surpassing the 58.5% obtained by the much larger, cloud-bound GPT-4o. These results highlight that practical, on-board deployment of small LLMs is not only feasible but can outperform larger models when trained through environmental interaction, underscoring the importance of an interactive, embodied learning framework for robotic EmbodiedAI — one grounded in practical experience rather than static supervision.}
}

Endnote

%0 Conference Paper
%T RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning
%A Liam Boyle
%A Nicolas Baumann
%A Paviththiren Sivasothilingam
%A Michele Magno
%A Luca Benini
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-boyle25a
%I PMLR
%P 4074--4092
%U https://proceedings.mlr.press/v305/boyle25a.html
%V 305
%X Future robotic systems operating in real-world environments require on-board embodied intelligence without continuous cloud connection, balancing capabilities with constraints on computational power and memory. This work presents an extension of the R1-zero approach, which enables the usage of small parameter-count Large Language Models (LLMs) in the robotic domain. The R1-Zero approach was originally developed to enable mathematical reasoning in LLMs using static datasets. We extend it to the robotics domain through integration with a closed-loop Reinforcement Learning (RL) framework. This extension allows reasoning in Embodied Artificial Intelligence (EmbodiedAI) settings without relying solely on distillation of large models through Supervised Fine-Tuning (SFT). We show that small-scale LLMs can achieve effective reasoning performance by learning through closed-loop interaction with their environment, which enables tasks that previously required significantly larger models. A performance gain of 20.2% points over the SFT-based baseline is observed with a Qwen2.5-1.5B model. Using the proposed training procedure, Qwen2.5-3B achieves a 63.3% control adaptability score, surpassing the 58.5% obtained by the much larger, cloud-bound GPT-4o. These results highlight that practical, on-board deployment of small LLMs is not only feasible but can outperform larger models when trained through environmental interaction, underscoring the importance of an interactive, embodied learning framework for robotic EmbodiedAI — one grounded in practical experience rather than static supervision.

APA

Boyle, L., Baumann, N., Sivasothilingam, P., Magno, M. & Benini, L.. (2025). RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:4074-4092 Available from https://proceedings.mlr.press/v305/boyle25a.html.

RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning

Abstract

Cite this Paper

Related Material