Learning Visually Guided Latent Actions for Assistive Teleoperation
Proceedings of the 3rd Conference on Learning for Dynamics and Control, PMLR 144:1230-1241, 2021.
It is challenging for humans — particularly people living with physical disabilities — to control high-dimensional and dexterous robots. Prior work explores how robots can learn embedding functions that map a human’s low-dimensional inputs (e.g., via a joystick) to complex, high-dimensional robot actions for assistive teleoperation; unfortunately, there are many more high-dimensional actions than available low-dimensional inputs! To extract the correct action and maximally assist their human controller, robots must reason over their current context: for example, pressing a joystick right when interacting with a coffee cup indicates a different action than when interacting with food. In this work, we develop assistive robots that condition their latent embeddings on visual inputs. We explore a spectrum of plausible visual encoders and show that incorporating object detectors pretrained on a small amount of cheap and easy-to-collect structured data enables i) accurately and robustly recognizing the current context and ii) generalizing control embeddings to new objects and tasks. In user studies with a high-dimensional physical robot arm, participants leverage this approach to perform new tasks with unseen objects. Our results indicate that structured visual representations improves few-shot performance and is subjectively preferred by users.