Learning to Manipulate Object Collections Using Grounded State Representations

Matthew Wilson, Tucker Hermans
Proceedings of the Conference on Robot Learning, PMLR 100:490-502, 2020.

Abstract

We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. First, we train a pair of encoders on raw object pose targets to learn representations that accurately capture the state information of a multi-object environment. Second, we use these encoders in a reinforcement learning algorithm to train image-based policies capable of manipulating many objects. Our pair of encoders consists of one which consumes RGB images and is used in our policy network, and one which directly consumes a set of raw object poses and is used for reward calculation and value estimation. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training. Video results: bit.ly/2khSKUs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v100-wilson20a, title = {Learning to Manipulate Object Collections Using Grounded State Representations}, author = {Wilson, Matthew and Hermans, Tucker}, booktitle = {Proceedings of the Conference on Robot Learning}, pages = {490--502}, year = {2020}, editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei}, volume = {100}, series = {Proceedings of Machine Learning Research}, month = {30 Oct--01 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v100/wilson20a/wilson20a.pdf}, url = {https://proceedings.mlr.press/v100/wilson20a.html}, abstract = {We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. First, we train a pair of encoders on raw object pose targets to learn representations that accurately capture the state information of a multi-object environment. Second, we use these encoders in a reinforcement learning algorithm to train image-based policies capable of manipulating many objects. Our pair of encoders consists of one which consumes RGB images and is used in our policy network, and one which directly consumes a set of raw object poses and is used for reward calculation and value estimation. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training. Video results: bit.ly/2khSKUs.} }
Endnote
%0 Conference Paper %T Learning to Manipulate Object Collections Using Grounded State Representations %A Matthew Wilson %A Tucker Hermans %B Proceedings of the Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2020 %E Leslie Pack Kaelbling %E Danica Kragic %E Komei Sugiura %F pmlr-v100-wilson20a %I PMLR %P 490--502 %U https://proceedings.mlr.press/v100/wilson20a.html %V 100 %X We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. First, we train a pair of encoders on raw object pose targets to learn representations that accurately capture the state information of a multi-object environment. Second, we use these encoders in a reinforcement learning algorithm to train image-based policies capable of manipulating many objects. Our pair of encoders consists of one which consumes RGB images and is used in our policy network, and one which directly consumes a set of raw object poses and is used for reward calculation and value estimation. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training. Video results: bit.ly/2khSKUs.
APA
Wilson, M. & Hermans, T.. (2020). Learning to Manipulate Object Collections Using Grounded State Representations. Proceedings of the Conference on Robot Learning, in Proceedings of Machine Learning Research 100:490-502 Available from https://proceedings.mlr.press/v100/wilson20a.html.

Related Material