D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation

I-Chun Arthur Liu, Jason Chen, Gaurav S. Sukhatme, Daniel Seita
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3569-3588, 2025.

Abstract

Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 180 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our anonymous website is at: https://dcodaaug.github.io/D-CODA/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-liu25f, title = {D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation}, author = {Liu, I-Chun Arthur and Chen, Jason and Sukhatme, Gaurav S. and Seita, Daniel}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3569--3588}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/liu25f/liu25f.pdf}, url = {https://proceedings.mlr.press/v305/liu25f.html}, abstract = {Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 180 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our anonymous website is at: https://dcodaaug.github.io/D-CODA/.} }
Endnote
%0 Conference Paper %T D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation %A I-Chun Arthur Liu %A Jason Chen %A Gaurav S. Sukhatme %A Daniel Seita %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-liu25f %I PMLR %P 3569--3588 %U https://proceedings.mlr.press/v305/liu25f.html %V 305 %X Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 180 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our anonymous website is at: https://dcodaaug.github.io/D-CODA/.
APA
Liu, I.A., Chen, J., Sukhatme, G.S. & Seita, D.. (2025). D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3569-3588 Available from https://proceedings.mlr.press/v305/liu25f.html.

Related Material