Few-Shot Preference Learning for Human-in-the-Loop RL

Donald Joseph Hejna III, Dorsa Sadigh
Proceedings of The 6th Conference on Robot Learning, PMLR 205:2014-2025, 2023.

Abstract

While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$\times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results can be found at https://sites.google.com/view/few-shot-preference-rl/home.

Cite this Paper


BibTeX
@InProceedings{pmlr-v205-iii23a, title = {Few-Shot Preference Learning for Human-in-the-Loop RL}, author = {III, Donald Joseph Hejna and Sadigh, Dorsa}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, pages = {2014--2025}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v205/iii23a/iii23a.pdf}, url = {https://proceedings.mlr.press/v205/iii23a.html}, abstract = {While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$\times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results can be found at https://sites.google.com/view/few-shot-preference-rl/home.} }
Endnote
%0 Conference Paper %T Few-Shot Preference Learning for Human-in-the-Loop RL %A Donald Joseph Hejna III %A Dorsa Sadigh %B Proceedings of The 6th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2023 %E Karen Liu %E Dana Kulic %E Jeff Ichnowski %F pmlr-v205-iii23a %I PMLR %P 2014--2025 %U https://proceedings.mlr.press/v205/iii23a.html %V 205 %X While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$\times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results can be found at https://sites.google.com/view/few-shot-preference-rl/home.
APA
III, D.J.H. & Sadigh, D.. (2023). Few-Shot Preference Learning for Human-in-the-Loop RL. Proceedings of The 6th Conference on Robot Learning, in Proceedings of Machine Learning Research 205:2014-2025 Available from https://proceedings.mlr.press/v205/iii23a.html.

Related Material