RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation

Zongzheng Zhang, Chenghao Yue, Haobo Xu, Minwen Liao, Xianglin Qi, Huan-ang Gao, Ziwei Wang, Hao Zhao
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3537-3568, 2025.

Abstract

Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose RoboChemist, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, $\pi_0$) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show significant improvements in both success rate and compliance rate over state-of-the-art VLM and VLA baselines, while also demonstrating strong generalization to objects and tasks. Code, data, and models will be released.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-zhang25i, title = {RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation}, author = {Zhang, Zongzheng and Yue, Chenghao and Xu, Haobo and Liao, Minwen and Qi, Xianglin and Gao, Huan-ang and Wang, Ziwei and Zhao, Hao}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3537--3568}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/zhang25i/zhang25i.pdf}, url = {https://proceedings.mlr.press/v305/zhang25i.html}, abstract = {Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose RoboChemist, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, $\pi_0$) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show significant improvements in both success rate and compliance rate over state-of-the-art VLM and VLA baselines, while also demonstrating strong generalization to objects and tasks. Code, data, and models will be released.} }
Endnote
%0 Conference Paper %T RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation %A Zongzheng Zhang %A Chenghao Yue %A Haobo Xu %A Minwen Liao %A Xianglin Qi %A Huan-ang Gao %A Ziwei Wang %A Hao Zhao %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-zhang25i %I PMLR %P 3537--3568 %U https://proceedings.mlr.press/v305/zhang25i.html %V 305 %X Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose RoboChemist, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, $\pi_0$) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show significant improvements in both success rate and compliance rate over state-of-the-art VLM and VLA baselines, while also demonstrating strong generalization to objects and tasks. Code, data, and models will be released.
APA
Zhang, Z., Yue, C., Xu, H., Liao, M., Qi, X., Gao, H., Wang, Z. & Zhao, H.. (2025). RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3537-3568 Available from https://proceedings.mlr.press/v305/zhang25i.html.

Related Material