Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds

Minghua Liu, Xuanlin Li, Zhan Ling, Yangyan Li, Hao Su
Proceedings of The 6th Conference on Robot Learning, PMLR 205:527-538, 2023.

Abstract

We study how choices of input point cloud coordinate frames impact learning of manipulation skills from 3D point clouds. There exist a variety of coordinate frame choices to normalize captured robot-object-interaction point clouds. We find that different frames have a profound effect on agent learning performance, and the trend is similar across 3D backbone networks. In particular, the end-effector frame and the target-part frame achieve higher training efficiency than the commonly used world frame and robot-base frame in many tasks, intuitively because they provide helpful alignments among point clouds across time steps and thus can simplify visual module learning. Moreover, the well-performing frames vary across tasks, and some tasks may benefit from multiple frame candidates. We thus propose FrameMiners to adaptively select candidate frames and fuse their merits in a task-agnostic manner. Experimentally, FrameMiners achieves on-par or significantly higher performance than the best single-frame version on five fully physical manipulation tasks adapted from ManiSkill and OCRTOC. Without changing existing camera placements or adding extra cameras, point cloud frame mining can serve as a free lunch to improve 3D manipulation learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-liu23d, title = {Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds}, author = {Liu, Minghua and Li, Xuanlin and Ling, Zhan and Li, Yangyan and Su, Hao}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {527--538}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/liu23d/liu23d.pdf}, url = {https://proceedings.mlr.press/v205/liu23d.html}, abstract = { We study how choices of input point cloud coordinate frames impact learning of manipulation skills from 3D point clouds. There exist a variety of coordinate frame choices to normalize captured robot-object-interaction point clouds. We find that different frames have a profound effect on agent learning performance, and the trend is similar across 3D backbone networks. In particular, the end-effector frame and the target-part frame achieve higher training efficiency than the commonly used world frame and robot-base frame in many tasks, intuitively because they provide helpful alignments among point clouds across time steps and thus can simplify visual module learning. Moreover, the well-performing frames vary across tasks, and some tasks may benefit from multiple frame candidates. We thus propose FrameMiners to adaptively select candidate frames and fuse their merits in a task-agnostic manner. Experimentally, FrameMiners achieves on-par or significantly higher performance than the best single-frame version on five fully physical manipulation tasks adapted from ManiSkill and OCRTOC. Without changing existing camera placements or adding extra cameras, point cloud frame mining can serve as a free lunch to improve 3D manipulation learning.} }
Endnote
%0 Conference Paper %T Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds %A Minghua Liu %A Xuanlin Li %A Zhan Ling %A Yangyan Li %A Hao Su %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-liu23d %I PMLR %P 527--538 %U https://proceedings.mlr.press/v205/liu23d.html %V 205 %X We study how choices of input point cloud coordinate frames impact learning of manipulation skills from 3D point clouds. There exist a variety of coordinate frame choices to normalize captured robot-object-interaction point clouds. We find that different frames have a profound effect on agent learning performance, and the trend is similar across 3D backbone networks. In particular, the end-effector frame and the target-part frame achieve higher training efficiency than the commonly used world frame and robot-base frame in many tasks, intuitively because they provide helpful alignments among point clouds across time steps and thus can simplify visual module learning. Moreover, the well-performing frames vary across tasks, and some tasks may benefit from multiple frame candidates. We thus propose FrameMiners to adaptively select candidate frames and fuse their merits in a task-agnostic manner. Experimentally, FrameMiners achieves on-par or significantly higher performance than the best single-frame version on five fully physical manipulation tasks adapted from ManiSkill and OCRTOC. Without changing existing camera placements or adding extra cameras, point cloud frame mining can serve as a free lunch to improve 3D manipulation learning.
APA
Liu, M., Li, X., Ling, Z., Li, Y. & Su, H.. (2023). Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:527-538 Available from https://proceedings.mlr.press/v205/liu23d.html.

Related Material