Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Suraj Nair, Eric Mitchell, Kevin Chen, brian ichter, Silvio Savarese, Chelsea Finn
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1303-1315, 2022.

Abstract

We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot’s observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot’s observation space. To scalably learn this grounding we propose to leverage offline pre-collected robotic datasets (including highly sub-optimal, autonomously-collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform a range of visuomotor tasks from natural language, such as “open the right drawer” and “move the stapler”, on a Franka Emika Panda robot.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-nair22a, title = {Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation}, author = {Nair, Suraj and Mitchell, Eric and Chen, Kevin and ichter, brian and Savarese, Silvio and Finn, Chelsea}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1303--1315}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/nair22a/nair22a.pdf}, url = {https://proceedings.mlr.press/v164/nair22a.html}, abstract = {We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot’s observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot’s observation space. To scalably learn this grounding we propose to leverage offline pre-collected robotic datasets (including highly sub-optimal, autonomously-collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform a range of visuomotor tasks from natural language, such as “open the right drawer” and “move the stapler”, on a Franka Emika Panda robot.} }
Endnote
%0 Conference Paper %T Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation %A Suraj Nair %A Eric Mitchell %A Kevin Chen %A brian ichter %A Silvio Savarese %A Chelsea Finn %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-nair22a %I PMLR %P 1303--1315 %U https://proceedings.mlr.press/v164/nair22a.html %V 164 %X We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction. In order to accomplish this, humans need easy and effective ways of specifying tasks to the robot. Goal images are one popular form of task specification, as they are already grounded in the robot’s observation space. However, goal images also have a number of drawbacks: they are inconvenient for humans to provide, they can over-specify the desired behavior leading to a sparse reward signal, or under-specify task information in the case of non-goal reaching tasks. Natural language provides a convenient and flexible alternative for task specification, but comes with the challenge of grounding language in the robot’s observation space. To scalably learn this grounding we propose to leverage offline pre-collected robotic datasets (including highly sub-optimal, autonomously-collected data) with crowd-sourced natural language labels. With this data, we learn a simple classifier which predicts if a change in state completes a language instruction. This provides a language-conditioned reward function that can then be used for offline multi-task RL. In our experiments, we find that on language-conditioned manipulation tasks our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%, and is able to perform a range of visuomotor tasks from natural language, such as “open the right drawer” and “move the stapler”, on a Franka Emika Panda robot.
APA
Nair, S., Mitchell, E., Chen, K., ichter, b., Savarese, S. & Finn, C.. (2022). Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1303-1315 Available from https://proceedings.mlr.press/v164/nair22a.html.

Related Material