Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

Xingyu Wang, Diego Klabjan
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5143-5151, 2018.

Abstract

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To ?nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-wang18d, title = {Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations}, author = {Wang, Xingyu and Klabjan, Diego}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {5143--5151}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/wang18d/wang18d.pdf}, url = {https://proceedings.mlr.press/v80/wang18d.html}, abstract = {This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To ?nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.} }
Endnote
%0 Conference Paper %T Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations %A Xingyu Wang %A Diego Klabjan %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-wang18d %I PMLR %P 5143--5151 %U https://proceedings.mlr.press/v80/wang18d.html %V 80 %X This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To ?nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.
APA
Wang, X. & Klabjan, D.. (2018). Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5143-5151 Available from https://proceedings.mlr.press/v80/wang18d.html.

Related Material