Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views

Jingyun Yang; Hsiao-Yu Tung; Yunchu Zhang; Gaurav Pathak; Ashwini Pokle; Christopher G Atkeson; Katerina Fragkiadaki

Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views

Jingyun Yang, Hsiao-Yu Tung, Yunchu Zhang, Gaurav Pathak, Ashwini Pokle, Christopher G Atkeson, Katerina Fragkiadaki

Proceedings of the 5th Conference on Robot Learning, PMLR 164:695-705, 2022.

Abstract

We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v164-yang22c,
  title = 	 {Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views},
  author =       {Yang, Jingyun and Tung, Hsiao-Yu and Zhang, Yunchu and Pathak, Gaurav and Pokle, Ashwini and Atkeson, Christopher G and Fragkiadaki, Katerina},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {695--705},
  year = 	 {2022},
  editor = 	 {Faust, Aleksandra and Hsu, David and Neumann, Gerhard},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--11 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v164/yang22c/yang22c.pdf},
  url = 	 {https://proceedings.mlr.press/v164/yang22c.html},
  abstract = 	 {We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.}
}

Endnote

%0 Conference Paper
%T Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views
%A Jingyun Yang
%A Hsiao-Yu Tung
%A Yunchu Zhang
%A Gaurav Pathak
%A Ashwini Pokle
%A Christopher G Atkeson
%A Katerina Fragkiadaki
%B Proceedings of the 5th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Aleksandra Faust
%E David Hsu
%E Gerhard Neumann	
%F pmlr-v164-yang22c
%I PMLR
%P 695--705
%U https://proceedings.mlr.press/v164/yang22c.html
%V 164
%X We propose a visually-grounded library of behaviors approach for learning to manipulate diverse objects across varying initial and goal configurations and camera placements. Our key innovation is to disentangle the standard image-to-action mapping into two separate modules that use different types of perceptual input: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties, such as object location and pose, to predict actions to execute over time. The selector uses a semantically-rich 3D object feature representation extracted from images in a differential end-to-end manner. This representation is trained to be view-invariant and affordance-aware using self-supervision, by predicting varying views and successful object manipulations. We test our framework on pushing and grasping diverse objects in simulation as well as transporting rigid, granular, and liquid food ingredients in a real robot setup. Our model outperforms image-to-action mappings that do not factorize static and dynamic object properties. We further ablate the contribution of the selector’s input and show the benefits of the proposed view-predictive, affordance-aware 3D visual object representations.

APA


Yang, J., Tung, H., Zhang, Y., Pathak, G., Pokle, A., Atkeson, C.G. & Fragkiadaki, K.. (2022). Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:695-705 Available from https://proceedings.mlr.press/v164/yang22c.html.

Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views

Abstract

Cite this Paper

Related Material