One Demo is Worth a Thousand Trajectories: Action-View Augmentation for Visuomotor Policies

Chuer Pan, Litian Liang, Dominik Bauer, Eric Cousineau, Benjamin Burchfiel, Siyuan Feng, Shuran Song
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3902-3914, 2025.

Abstract

Visuomotor policies for manipulation have demonstrated remarkable potential in modeling complex robotic behaviors, yet minor alterations in the robot’s initial configuration and unseen obstacles easily lead to out-of-distribution observations. Without extensive data collection effort, these result in catastrophic execution failures. In this work, we introduce an effective data augmentation framework that generates visually realistic fisheye image sequences and corresponding physically feasible action trajectories from real-world eye-in-hand demonstrations, captured with a portable parallel gripper with a single fisheye camera. We introduce a novel Gaussian Splatting formulation, adapted to wide FoV fisheye cameras, to reconstruct and edit the 3D scene with unseen objects. We utilize trajectory optimization to generate smooth, collision-free, view-rendering-friendly action trajectories and render visual observations from corresponding novel views. Comprehensive experiments in simulation and the real world show that our augmentation framework improves the success rate for various manipulation tasks in both the same scene and the augmented scene with obstacles requiring collision avoidance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-pan25a, title = {One Demo is Worth a Thousand Trajectories: Action-View Augmentation for Visuomotor Policies}, author = {Pan, Chuer and Liang, Litian and Bauer, Dominik and Cousineau, Eric and Burchfiel, Benjamin and Feng, Siyuan and Song, Shuran}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3902--3914}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/pan25a/pan25a.pdf}, url = {https://proceedings.mlr.press/v305/pan25a.html}, abstract = {Visuomotor policies for manipulation have demonstrated remarkable potential in modeling complex robotic behaviors, yet minor alterations in the robot’s initial configuration and unseen obstacles easily lead to out-of-distribution observations. Without extensive data collection effort, these result in catastrophic execution failures. In this work, we introduce an effective data augmentation framework that generates visually realistic fisheye image sequences and corresponding physically feasible action trajectories from real-world eye-in-hand demonstrations, captured with a portable parallel gripper with a single fisheye camera. We introduce a novel Gaussian Splatting formulation, adapted to wide FoV fisheye cameras, to reconstruct and edit the 3D scene with unseen objects. We utilize trajectory optimization to generate smooth, collision-free, view-rendering-friendly action trajectories and render visual observations from corresponding novel views. Comprehensive experiments in simulation and the real world show that our augmentation framework improves the success rate for various manipulation tasks in both the same scene and the augmented scene with obstacles requiring collision avoidance.} }
Endnote
%0 Conference Paper %T One Demo is Worth a Thousand Trajectories: Action-View Augmentation for Visuomotor Policies %A Chuer Pan %A Litian Liang %A Dominik Bauer %A Eric Cousineau %A Benjamin Burchfiel %A Siyuan Feng %A Shuran Song %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-pan25a %I PMLR %P 3902--3914 %U https://proceedings.mlr.press/v305/pan25a.html %V 305 %X Visuomotor policies for manipulation have demonstrated remarkable potential in modeling complex robotic behaviors, yet minor alterations in the robot’s initial configuration and unseen obstacles easily lead to out-of-distribution observations. Without extensive data collection effort, these result in catastrophic execution failures. In this work, we introduce an effective data augmentation framework that generates visually realistic fisheye image sequences and corresponding physically feasible action trajectories from real-world eye-in-hand demonstrations, captured with a portable parallel gripper with a single fisheye camera. We introduce a novel Gaussian Splatting formulation, adapted to wide FoV fisheye cameras, to reconstruct and edit the 3D scene with unseen objects. We utilize trajectory optimization to generate smooth, collision-free, view-rendering-friendly action trajectories and render visual observations from corresponding novel views. Comprehensive experiments in simulation and the real world show that our augmentation framework improves the success rate for various manipulation tasks in both the same scene and the augmented scene with obstacles requiring collision avoidance.
APA
Pan, C., Liang, L., Bauer, D., Cousineau, E., Burchfiel, B., Feng, S. & Song, S.. (2025). One Demo is Worth a Thousand Trajectories: Action-View Augmentation for Visuomotor Policies. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3902-3914 Available from https://proceedings.mlr.press/v305/pan25a.html.

Related Material