Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning
Proceedings of the 3rd Machine Learning for Healthcare Conference, PMLR 85:420-439, 2018.
As robots become more affordable and more common in everyday life, particularly in assistive contexts, there will be an ever-increasing demand for adaptive behavior that is personalized to the individual needs of users. To accomplish this, robots will need to learn about their users’ unique preferences through interaction. Current preference learning techniques lack the ability to infer long-term, task-independent preferences in realistic, interactive, incomplete-information settings. To address this gap, we introduce a novel preference-inference formulation, inspired by assistive robotics applications, in which a robot must infer these kinds of preferences based only on observing the user’s behavior in various tasks. We then propose a candidate inference algorithm based on maximum-margin methods, and evaluate ts performance in the context of robot-assisted prehabilitation. We find that the algorithm learns to predict aspects of the user’s behavior as it is given more data, and that it shows strong convergence properties after a small number of iterations.