Reward Identification in Inverse Reinforcement Learning

Kuno Kim; Shivam Garg; Kirankumar Shiragur; Stefano Ermon

Reward Identification in Inverse Reinforcement Learning

Kuno Kim, Shivam Garg, Kirankumar Shiragur, Stefano Ermon

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5496-5505, 2021.

Abstract

We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-kim21c,
  title = 	 {Reward Identification in Inverse Reinforcement Learning},
  author =       {Kim, Kuno and Garg, Shivam and Shiragur, Kirankumar and Ermon, Stefano},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {5496--5505},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/kim21c/kim21c.pdf},
  url = 	 {https://proceedings.mlr.press/v139/kim21c.html},
  abstract = 	 {We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.}
}

Endnote

%0 Conference Paper
%T Reward Identification in Inverse Reinforcement Learning
%A Kuno Kim
%A Shivam Garg
%A Kirankumar Shiragur
%A Stefano Ermon
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-kim21c
%I PMLR
%P 5496--5505
%U https://proceedings.mlr.press/v139/kim21c.html
%V 139
%X We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.

APA

Kim, K., Garg, S., Shiragur, K. & Ermon, S.. (2021). Reward Identification in Inverse Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:5496-5505 Available from https://proceedings.mlr.press/v139/kim21c.html.

Reward Identification in Inverse Reinforcement Learning

Abstract

Cite this Paper

Related Material