Geometric Red-Teaming for Robotic Manipulation

Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held, Zackory Erickson
Proceedings of The 9th Conference on Robot Learning, PMLR 305:41-67, 2025.

Abstract

Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-goel25a, title = {Geometric Red-Teaming for Robotic Manipulation}, author = {Goel, Divyam and Wang, Yufei and Wu, Tiancheng and Qiao, Guixiu and Piliptchak, Pavel and Held, David and Erickson, Zackory}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {41--67}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/goel25a/goel25a.pdf}, url = {https://proceedings.mlr.press/v305/goel25a.html}, abstract = {Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.} }
Endnote
%0 Conference Paper %T Geometric Red-Teaming for Robotic Manipulation %A Divyam Goel %A Yufei Wang %A Tiancheng Wu %A Guixiu Qiao %A Pavel Piliptchak %A David Held %A Zackory Erickson %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-goel25a %I PMLR %P 41--67 %U https://proceedings.mlr.press/v305/goel25a.html %V 305 %X Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.
APA
Goel, D., Wang, Y., Wu, T., Qiao, G., Piliptchak, P., Held, D. & Erickson, Z.. (2025). Geometric Red-Teaming for Robotic Manipulation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:41-67 Available from https://proceedings.mlr.press/v305/goel25a.html.

Related Material