RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Hanxiao Jiang; Binghao Huang; Ruihai Wu; Zhuoran Li; Shubham Garg; Hooshang Nayyeri; Shenlong Wang; Yunzhu Li

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li

Proceedings of The 8th Conference on Robot Learning, PMLR 270:3027-3052, 2025.

Abstract

We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system’s capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects, and deformable objects. Project Page: https://jianghanxiao.github.io/roboexp-web/

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-jiang25c,
  title = 	 {RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation},
  author =       {Jiang, Hanxiao and Huang, Binghao and Wu, Ruihai and Li, Zhuoran and Garg, Shubham and Nayyeri, Hooshang and Wang, Shenlong and Li, Yunzhu},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {3027--3052},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/jiang25c/jiang25c.pdf},
  url = 	 {https://proceedings.mlr.press/v270/jiang25c.html},
  abstract = 	 {We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system’s capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects, and deformable objects. Project Page: https://jianghanxiao.github.io/roboexp-web/}
}

Endnote

%0 Conference Paper
%T RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation
%A Hanxiao Jiang
%A Binghao Huang
%A Ruihai Wu
%A Zhuoran Li
%A Shubham Garg
%A Hooshang Nayyeri
%A Shenlong Wang
%A Yunzhu Li
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-jiang25c
%I PMLR
%P 3027--3052
%U https://proceedings.mlr.press/v270/jiang25c.html
%V 270
%X We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment. The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene. To this end, we present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system’s capabilities. The robot reasons about what and how to explore an object, accumulating new information through the interaction process and incrementally constructing the ACSG. Leveraging the constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP system in facilitating a wide range of real-world manipulation tasks involving rigid, articulated objects, nested objects, and deformable objects. Project Page: https://jianghanxiao.github.io/roboexp-web/

APA

Jiang, H., Huang, B., Wu, R., Li, Z., Garg, S., Nayyeri, H., Wang, S. & Li, Y.. (2025). RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:3027-3052 Available from https://proceedings.mlr.press/v270/jiang25c.html.

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Abstract

Cite this Paper

Related Material