RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation

Chongkai Gao, Zhengrong Xue, Shuying Deng, Tianhai Liang, Siqi Yang, Lin Shao, Huazhe Xu
Proceedings of The 8th Conference on Robot Learning, PMLR 270:2164-2182, 2025.

Abstract

We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target actions for manipulation without any object segmentation. RiEMann can efficiently train the visuomotor policy from scratch with 5 to 10 demonstrations for a manipulation task, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable SE(3)-equivariant action space of RiEMann supports both pick-and-place tasks and articulated object manipulation tasks. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors (reduced by 68.6%), and achieves 5.4 frames per second (fps) network inference speed.

Cite this Paper


BibTeX
@InProceedings{pmlr-v270-gao25a, title = {RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation}, author = {Gao, Chongkai and Xue, Zhengrong and Deng, Shuying and Liang, Tianhai and Yang, Siqi and Shao, Lin and Xu, Huazhe}, booktitle = {Proceedings of The 8th Conference on Robot Learning}, pages = {2164--2182}, year = {2025}, editor = {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram}, volume = {270}, series = {Proceedings of Machine Learning Research}, month = {06--09 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v270/main/assets/gao25a/gao25a.pdf}, url = {https://proceedings.mlr.press/v270/gao25a.html}, abstract = {We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target actions for manipulation without any object segmentation. RiEMann can efficiently train the visuomotor policy from scratch with 5 to 10 demonstrations for a manipulation task, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable SE(3)-equivariant action space of RiEMann supports both pick-and-place tasks and articulated object manipulation tasks. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors (reduced by 68.6%), and achieves 5.4 frames per second (fps) network inference speed.} }
Endnote
%0 Conference Paper %T RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation %A Chongkai Gao %A Zhengrong Xue %A Shuying Deng %A Tianhai Liang %A Siqi Yang %A Lin Shao %A Huazhe Xu %B Proceedings of The 8th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Pulkit Agrawal %E Oliver Kroemer %E Wolfram Burgard %F pmlr-v270-gao25a %I PMLR %P 2164--2182 %U https://proceedings.mlr.press/v270/gao25a.html %V 270 %X We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target actions for manipulation without any object segmentation. RiEMann can efficiently train the visuomotor policy from scratch with 5 to 10 demonstrations for a manipulation task, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable SE(3)-equivariant action space of RiEMann supports both pick-and-place tasks and articulated object manipulation tasks. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors (reduced by 68.6%), and achieves 5.4 frames per second (fps) network inference speed.
APA
Gao, C., Xue, Z., Deng, S., Liang, T., Yang, S., Shao, L. & Xu, H.. (2025). RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:2164-2182 Available from https://proceedings.mlr.press/v270/gao25a.html.

Related Material