A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

Weichao Zhou, Wenchao Li
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:27159-27178, 2022.

Abstract

A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-zhou22b, title = {A Hierarchical {B}ayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines}, author = {Zhou, Weichao and Li, Wenchao}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {27159--27178}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/zhou22b/zhou22b.pdf}, url = {https://proceedings.mlr.press/v162/zhou22b.html}, abstract = {A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.} }
Endnote
%0 Conference Paper %T A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines %A Weichao Zhou %A Wenchao Li %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-zhou22b %I PMLR %P 27159--27178 %U https://proceedings.mlr.press/v162/zhou22b.html %V 162 %X A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.
APA
Zhou, W. & Li, W.. (2022). A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:27159-27178 Available from https://proceedings.mlr.press/v162/zhou22b.html.

Related Material