CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

Youngsun Wi; Mark Van der Merwe; Pete Florence; Andy Zeng; Nima Fazeli

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

Youngsun Wi, Mark Van der Merwe, Pete Florence, Andy Zeng, Nima Fazeli

Proceedings of The 7th Conference on Robot Learning, PMLR 229:2753-2771, 2023.

Abstract

Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks – from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, contact itself – predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (i.e. contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v229-wi23a,
  title = 	 {CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation},
  author =       {Wi, Youngsun and Merwe, Mark Van der and Florence, Pete and Zeng, Andy and Fazeli, Nima},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {2753--2771},
  year = 	 {2023},
  editor = 	 {Tan, Jie and Toussaint, Marc and Darvish, Kourosh},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v229/wi23a/wi23a.pdf},
  url = 	 {https://proceedings.mlr.press/v229/wi23a.html},
  abstract = 	 {Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks – from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, contact itself – predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (i.e. contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.}
}

Endnote

%0 Conference Paper
%T CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation
%A Youngsun Wi
%A Mark Van der Merwe
%A Pete Florence
%A Andy Zeng
%A Nima Fazeli
%B Proceedings of The 7th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Jie Tan
%E Marc Toussaint
%E Kourosh Darvish	
%F pmlr-v229-wi23a
%I PMLR
%P 2753--2771
%U https://proceedings.mlr.press/v229/wi23a.html
%V 229
%X Making contact with purpose is a central part of robot manipulation and remains essential for many household tasks – from sweeping dust into a dustpan, to wiping tables; from erasing whiteboards, to applying paint. In this work, we investigate learning language-conditioned, vision-based manipulation policies wherein the action representation is in fact, contact itself – predicting contact formations at which tools grasped by the robot should meet an observable surface. Our approach, Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation (CALAMARI), exhibits several advantages including (i) benefiting from existing visual-language models for pretrained spatial features, grounding instructions to behaviors, and for sim2real transfer; and (ii) factorizing perception and control over a natural boundary (i.e. contact) into two modules that synergize with each other, whereby action predictions can be aligned per pixel with image observations, and low-level controllers can optimize motion trajectories that maintain contact while avoiding penetration. Experiments show that CALAMARI outperforms existing state-of-the-art model architectures for a broad range of contact-rich tasks, and pushes new ground on embodiment-agnostic generalization to unseen objects with varying elasticity, geometry, and colors in both simulated and real-world settings.

APA


Wi, Y., Merwe, M.V.d., Florence, P., Zeng, A. & Fazeli, N.. (2023). CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation. Proceedings of The 7th Conference on Robot Learning, in Proceedings of Machine Learning Research 229:2753-2771 Available from https://proceedings.mlr.press/v229/wi23a.html.

CALAMARI: Contact-Aware and Language conditioned spatial Action MApping for contact-RIch manipulation

Abstract

Cite this Paper

Related Material