Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning

Bryce Woodworth, Francesco Ferrari, Teofilo E. Zosa, Laurel D. Riek
Proceedings of the 3rd Machine Learning for Healthcare Conference, PMLR 85:420-439, 2018.

Abstract

As robots become more affordable and more common in everyday life, particularly in assistive contexts, there will be an ever-increasing demand for adaptive behavior that is personalized to the individual needs of users. To accomplish this, robots will need to learn about their users’ unique preferences through interaction. Current preference learning techniques lack the ability to infer long-term, task-independent preferences in realistic, interactive, incomplete-information settings. To address this gap, we introduce a novel preference-inference formulation, inspired by assistive robotics applications, in which a robot must infer these kinds of preferences based only on observing the user’s behavior in various tasks. We then propose a candidate inference algorithm based on maximum-margin methods, and evaluate ts performance in the context of robot-assisted prehabilitation. We find that the algorithm learns to predict aspects of the user’s behavior as it is given more data, and that it shows strong convergence properties after a small number of iterations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v85-woodworth18a, title = {Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning}, author = {Woodworth, Bryce and Ferrari, Francesco and Zosa, Teofilo E. and Riek, Laurel D.}, booktitle = {Proceedings of the 3rd Machine Learning for Healthcare Conference}, pages = {420--439}, year = {2018}, editor = {Doshi-Velez, Finale and Fackler, Jim and Jung, Ken and Kale, David and Ranganath, Rajesh and Wallace, Byron and Wiens, Jenna}, volume = {85}, series = {Proceedings of Machine Learning Research}, month = {17--18 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v85/woodworth18a/woodworth18a.pdf}, url = {https://proceedings.mlr.press/v85/woodworth18a.html}, abstract = {As robots become more affordable and more common in everyday life, particularly in assistive contexts, there will be an ever-increasing demand for adaptive behavior that is personalized to the individual needs of users. To accomplish this, robots will need to learn about their users’ unique preferences through interaction. Current preference learning techniques lack the ability to infer long-term, task-independent preferences in realistic, interactive, incomplete-information settings. To address this gap, we introduce a novel preference-inference formulation, inspired by assistive robotics applications, in which a robot must infer these kinds of preferences based only on observing the user’s behavior in various tasks. We then propose a candidate inference algorithm based on maximum-margin methods, and evaluate ts performance in the context of robot-assisted prehabilitation. We find that the algorithm learns to predict aspects of the user’s behavior as it is given more data, and that it shows strong convergence properties after a small number of iterations.} }
Endnote
%0 Conference Paper %T Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning %A Bryce Woodworth %A Francesco Ferrari %A Teofilo E. Zosa %A Laurel D. Riek %B Proceedings of the 3rd Machine Learning for Healthcare Conference %C Proceedings of Machine Learning Research %D 2018 %E Finale Doshi-Velez %E Jim Fackler %E Ken Jung %E David Kale %E Rajesh Ranganath %E Byron Wallace %E Jenna Wiens %F pmlr-v85-woodworth18a %I PMLR %P 420--439 %U https://proceedings.mlr.press/v85/woodworth18a.html %V 85 %X As robots become more affordable and more common in everyday life, particularly in assistive contexts, there will be an ever-increasing demand for adaptive behavior that is personalized to the individual needs of users. To accomplish this, robots will need to learn about their users’ unique preferences through interaction. Current preference learning techniques lack the ability to infer long-term, task-independent preferences in realistic, interactive, incomplete-information settings. To address this gap, we introduce a novel preference-inference formulation, inspired by assistive robotics applications, in which a robot must infer these kinds of preferences based only on observing the user’s behavior in various tasks. We then propose a candidate inference algorithm based on maximum-margin methods, and evaluate ts performance in the context of robot-assisted prehabilitation. We find that the algorithm learns to predict aspects of the user’s behavior as it is given more data, and that it shows strong convergence properties after a small number of iterations.
APA
Woodworth, B., Ferrari, F., Zosa, T.E. & Riek, L.D.. (2018). Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning. Proceedings of the 3rd Machine Learning for Healthcare Conference, in Proceedings of Machine Learning Research 85:420-439 Available from https://proceedings.mlr.press/v85/woodworth18a.html.

Related Material