Geometric Red-Teaming for Robotic Manipulation

Divyam Goel; Yufei Wang; Tiancheng Wu; Guixiu Qiao; Pavel Piliptchak; David Held; Zackory Erickson

Geometric Red-Teaming for Robotic Manipulation

Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held, Zackory Erickson

Proceedings of The 9th Conference on Robot Learning, PMLR 305:41-67, 2025.

Abstract

Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v305-goel25a,
  title = 	 {Geometric Red-Teaming for Robotic Manipulation},
  author =       {Goel, Divyam and Wang, Yufei and Wu, Tiancheng and Qiao, Guixiu and Piliptchak, Pavel and Held, David and Erickson, Zackory},
  booktitle = 	 {Proceedings of The 9th Conference on Robot Learning},
  pages = 	 {41--67},
  year = 	 {2025},
  editor = 	 {Lim, Joseph and Song, Shuran and Park, Hae-Won},
  volume = 	 {305},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {27--30 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v305/main/assets/goel25a/goel25a.pdf},
  url = 	 {https://proceedings.mlr.press/v305/goel25a.html},
  abstract = 	 {Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation.  We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies.  The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks.  By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.}
}

Endnote

%0 Conference Paper
%T Geometric Red-Teaming for Robotic Manipulation
%A Divyam Goel
%A Yufei Wang
%A Tiancheng Wu
%A Guixiu Qiao
%A Pavel Piliptchak
%A David Held
%A Zackory Erickson
%B Proceedings of The 9th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Joseph Lim
%E Shuran Song
%E Hae-Won Park	
%F pmlr-v305-goel25a
%I PMLR
%P 41--67
%U https://proceedings.mlr.press/v305/goel25a.html
%V 305
%X Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation.  We introduce a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes—structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies.  The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, our approach consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks.  By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry—closely matching simulation outcomes.

APA

Goel, D., Wang, Y., Wu, T., Qiao, G., Piliptchak, P., Held, D. & Erickson, Z.. (2025). Geometric Red-Teaming for Robotic Manipulation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:41-67 Available from https://proceedings.mlr.press/v305/goel25a.html.

Geometric Red-Teaming for Robotic Manipulation

Abstract

Cite this Paper

Related Material