An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

Dhruv Malik, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3394-3402, 2018.

Abstract

Our goal is for AI systems to correctly identify and act according to their human user’s objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL: the human is a full information agent. This enables us to derive an optimality-preserving modification to the standard Bellman update, which reduces the complexity of the problem by an exponential factor. Additionally, we show that our modified Bellman update allows us to relax CIRL’s assumption of human rationality. We apply this update to a variety of POMDP solvers, including exact methods, point-based methods, and Monte Carlo Tree Search methods. We find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogical (teaching) behavior, while the robot interprets it as such and attains higher value for the human.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-malik18a, title = {An Efficient, Generalized {B}ellman Update For Cooperative Inverse Reinforcement Learning}, author = {Malik, Dhruv and Palaniappan, Malayandi and Fisac, Jaime and Hadfield-Menell, Dylan and Russell, Stuart and Dragan, Anca}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {3394--3402}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/malik18a/malik18a.pdf}, url = {https://proceedings.mlr.press/v80/malik18a.html}, abstract = {Our goal is for AI systems to correctly identify and act according to their human user’s objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL: the human is a full information agent. This enables us to derive an optimality-preserving modification to the standard Bellman update, which reduces the complexity of the problem by an exponential factor. Additionally, we show that our modified Bellman update allows us to relax CIRL’s assumption of human rationality. We apply this update to a variety of POMDP solvers, including exact methods, point-based methods, and Monte Carlo Tree Search methods. We find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogical (teaching) behavior, while the robot interprets it as such and attains higher value for the human.} }
Endnote
%0 Conference Paper %T An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning %A Dhruv Malik %A Malayandi Palaniappan %A Jaime Fisac %A Dylan Hadfield-Menell %A Stuart Russell %A Anca Dragan %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-malik18a %I PMLR %P 3394--3402 %U https://proceedings.mlr.press/v80/malik18a.html %V 80 %X Our goal is for AI systems to correctly identify and act according to their human user’s objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL: the human is a full information agent. This enables us to derive an optimality-preserving modification to the standard Bellman update, which reduces the complexity of the problem by an exponential factor. Additionally, we show that our modified Bellman update allows us to relax CIRL’s assumption of human rationality. We apply this update to a variety of POMDP solvers, including exact methods, point-based methods, and Monte Carlo Tree Search methods. We find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogical (teaching) behavior, while the robot interprets it as such and attains higher value for the human.
APA
Malik, D., Palaniappan, M., Fisac, J., Hadfield-Menell, D., Russell, S. & Dragan, A.. (2018). An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:3394-3402 Available from https://proceedings.mlr.press/v80/malik18a.html.

Related Material