Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin
Proceedings of the 5th Conference on Robot Learning, PMLR 164:1259-1268, 2022.

Abstract

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve downstream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.

Cite this Paper


BibTeX
@InProceedings{pmlr-v164-wang22g, title = {Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback}, author = {Wang, Xiaofei and Lee, Kimin and Hakhamaneshi, Kourosh and Abbeel, Pieter and Laskin, Michael}, booktitle = {Proceedings of the 5th Conference on Robot Learning}, pages = {1259--1268}, year = {2022}, editor = {Faust, Aleksandra and Hsu, David and Neumann, Gerhard}, volume = {164}, series = {Proceedings of Machine Learning Research}, month = {08--11 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v164/wang22g/wang22g.pdf}, url = {https://proceedings.mlr.press/v164/wang22g.html}, abstract = {A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve downstream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.} }
Endnote
%0 Conference Paper %T Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback %A Xiaofei Wang %A Kimin Lee %A Kourosh Hakhamaneshi %A Pieter Abbeel %A Michael Laskin %B Proceedings of the 5th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2022 %E Aleksandra Faust %E David Hsu %E Gerhard Neumann %F pmlr-v164-wang22g %I PMLR %P 1259--1268 %U https://proceedings.mlr.press/v164/wang22g.html %V 164 %X A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve downstream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.
APA
Wang, X., Lee, K., Hakhamaneshi, K., Abbeel, P. & Laskin, M.. (2022). Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:1259-1268 Available from https://proceedings.mlr.press/v164/wang22g.html.

Related Material