LILA: Language-Informed Latent Actions

Siddharth Karamcheti, Megha Srivastava, Percy Liang, Dorsa Sadigh
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1379-1390, 2022.

Abstract

We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller – e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down – for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot’s end-effector to the bowl, and the other dimension controls the robot’s end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-karamcheti22a, title = {LILA: Language-Informed Latent Actions}, author = {Karamcheti, Siddharth and Srivastava, Megha and Liang, Percy and Sadigh, Dorsa}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1379--1390}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/karamcheti22a/karamcheti22a.pdf}, url = {https://proceedings.mlr.press/v164/karamcheti22a.html}, abstract = {We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller – e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down – for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot’s end-effector to the bowl, and the other dimension controls the robot’s end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.} }
Endnote
%0 Conference Paper %T LILA: Language-Informed Latent Actions %A Siddharth Karamcheti %A Megha Srivastava %A Percy Liang %A Dorsa Sadigh %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-karamcheti22a %I PMLR %P 1379--1390 %U https://proceedings.mlr.press/v164/karamcheti22a.html %V 164 %X We introduce Language-Informed Latent Actions (LILA), a framework for learning natural language interfaces in the context of human-robot collaboration. LILA falls under the shared autonomy paradigm: in addition to providing discrete language inputs, humans are given a low-dimensional controller – e.g., a 2 degree-of-freedom (DoF) joystick that can move left/right and up/down – for operating the robot. LILA learns to use language to modulate this controller, providing users with a language-informed control space: given an instruction like "place the cereal bowl on the tray," LILA may learn a 2-DoF space where one dimension controls the distance from the robot’s end-effector to the bowl, and the other dimension controls the robot’s end-effector pose relative to the grasp point on the bowl. We evaluate LILA with real-world user studies, where users can provide a language instruction while operating a 7-DoF Franka Emika Panda Arm to complete a series of complex manipulation tasks. We show that LILA models are not only more sample efficient and performant than imitation learning and end-effector control baselines, but that they are also qualitatively preferred by users.
APA
Karamcheti, S., Srivastava, M., Liang, P. & Sadigh, D.. (2022). LILA: Language-Informed Latent Actions. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1379-1390 Available from https://proceedings.mlr.press/v164/karamcheti22a.html.

Related Material