Batch Active Preference-Based Learning of Reward Functions

Erdem Biyik; Dorsa Sadigh

Batch Active Preference-Based Learning of Reward Functions

Erdem Biyik, Dorsa Sadigh

Proceedings of The 2nd Conference on Robot Learning, PMLR 87:519-528, 2018.

Abstract

Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users’ preferences.

Cite this Paper

BibTeX


@InProceedings{pmlr-v87-biyik18a,
  title = 	 {Batch Active Preference-Based Learning of Reward Functions},
  author =       {Biyik, Erdem and Sadigh, Dorsa},
  booktitle = 	 {Proceedings of The 2nd Conference on Robot Learning},
  pages = 	 {519--528},
  year = 	 {2018},
  editor = 	 {Billard, Aude and Dragan, Anca and Peters, Jan and Morimoto, Jun},
  volume = 	 {87},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29--31 Oct},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v87/biyik18a/biyik18a.pdf},
  url = 	 {https://proceedings.mlr.press/v87/biyik18a.html},
  abstract = 	 {Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users’ preferences. }
}

Endnote

%0 Conference Paper
%T Batch Active Preference-Based Learning of Reward Functions
%A Erdem Biyik
%A Dorsa Sadigh
%B Proceedings of The 2nd Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Aude Billard
%E Anca Dragan
%E Jan Peters
%E Jun Morimoto	
%F pmlr-v87-biyik18a
%I PMLR
%P 519--528
%U https://proceedings.mlr.press/v87/biyik18a.html
%V 87
%X Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning of reward functions using as few data samples as possible while still having short query generation times. We introduce several approximations to the batch active learning problem, and provide theoretical guarantees for the convergence of our algorithms. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We then showcase our algorithm in a study to learn human users’ preferences.

APA


Biyik, E. & Sadigh, D.. (2018). Batch Active Preference-Based Learning of Reward Functions. Proceedings of The 2nd Conference on Robot Learning, in Proceedings of Machine Learning Research 87:519-528 Available from https://proceedings.mlr.press/v87/biyik18a.html.

Batch Active Preference-Based Learning of Reward Functions

Abstract

Cite this Paper

Related Material