SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Wentao Yuan, Chris Paxton, Karthik Desingh, Dieter Fox
Proceedings of the 5th Conference on Robot Learning, PMLR 164:148-157, 2022.

Abstract

Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-yuan22a, title = {SORNet: Spatial Object-Centric Representations for Sequential Manipulation}, author = {Yuan, Wentao and Paxton, Chris and Desingh, Karthik and Fox, Dieter}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {148--157}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/yuan22a/yuan22a.pdf}, url = {https://proceedings.mlr.press/v164/yuan22a.html}, abstract = {Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.} }
Endnote
%0 Conference Paper %T SORNet: Spatial Object-Centric Representations for Sequential Manipulation %A Wentao Yuan %A Chris Paxton %A Karthik Desingh %A Dieter Fox %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-yuan22a %I PMLR %P 148--157 %U https://proceedings.mlr.press/v164/yuan22a.html %V 164 %X Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs is crucial. Prior works relying on explicit state estimation or end-to-end learning struggle with novel objects or new tasks. In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest. We show that the object embeddings learned by SORNet generalize zero-shot to unseen object entities on three spatial reasoning tasks: spatial relationship classification, skill precondition classification and relative direction regression, significantly outperforming baselines. Further, we present real-world robotic experiments demonstrating the usage of the learned object embeddings in task planning for sequential manipulation.
APA
Yuan, W., Paxton, C., Desingh, K. & Fox, D.. (2022). SORNet: Spatial Object-Centric Representations for Sequential Manipulation. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:148-157 Available from https://proceedings.mlr.press/v164/yuan22a.html.

Related Material