Provably Efficient Learning of Transferable Rewards

Alberto Maria Metelli, Giorgia Ramponi, Alessandro Concetti, Marcello Restelli
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7665-7676, 2021.

Abstract

The reward function is widely accepted as a succinct, robust, and transferable representation of a task. Typical approaches, at the basis of Inverse Reinforcement Learning (IRL), leverage on expert demonstrations to recover a reward function. In this paper, we study the theoretical properties of the class of reward functions that are compatible with the expert’s behavior. We analyze how the limited knowledge of the expert’s policy and of the environment affects the reward reconstruction phase. Then, we examine how the error propagates to the learned policy’s performance when transferring the reward function to a different environment. We employ these findings to devise a provably efficient active sampling approach, aware of the need for transferring the reward function, that can be paired with a large variety of IRL algorithms. Finally, we provide numerical simulations on benchmark environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-metelli21a, title = {Provably Efficient Learning of Transferable Rewards}, author = {Metelli, Alberto Maria and Ramponi, Giorgia and Concetti, Alessandro and Restelli, Marcello}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7665--7676}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/metelli21a/metelli21a.pdf}, url = {https://proceedings.mlr.press/v139/metelli21a.html}, abstract = {The reward function is widely accepted as a succinct, robust, and transferable representation of a task. Typical approaches, at the basis of Inverse Reinforcement Learning (IRL), leverage on expert demonstrations to recover a reward function. In this paper, we study the theoretical properties of the class of reward functions that are compatible with the expert’s behavior. We analyze how the limited knowledge of the expert’s policy and of the environment affects the reward reconstruction phase. Then, we examine how the error propagates to the learned policy’s performance when transferring the reward function to a different environment. We employ these findings to devise a provably efficient active sampling approach, aware of the need for transferring the reward function, that can be paired with a large variety of IRL algorithms. Finally, we provide numerical simulations on benchmark environments.} }
Endnote
%0 Conference Paper %T Provably Efficient Learning of Transferable Rewards %A Alberto Maria Metelli %A Giorgia Ramponi %A Alessandro Concetti %A Marcello Restelli %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-metelli21a %I PMLR %P 7665--7676 %U https://proceedings.mlr.press/v139/metelli21a.html %V 139 %X The reward function is widely accepted as a succinct, robust, and transferable representation of a task. Typical approaches, at the basis of Inverse Reinforcement Learning (IRL), leverage on expert demonstrations to recover a reward function. In this paper, we study the theoretical properties of the class of reward functions that are compatible with the expert’s behavior. We analyze how the limited knowledge of the expert’s policy and of the environment affects the reward reconstruction phase. Then, we examine how the error propagates to the learned policy’s performance when transferring the reward function to a different environment. We employ these findings to devise a provably efficient active sampling approach, aware of the need for transferring the reward function, that can be paired with a large variety of IRL algorithms. Finally, we provide numerical simulations on benchmark environments.
APA
Metelli, A.M., Ramponi, G., Concetti, A. & Restelli, M.. (2021). Provably Efficient Learning of Transferable Rewards. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7665-7676 Available from https://proceedings.mlr.press/v139/metelli21a.html.

Related Material