Multi-Agent Adversarial Inverse Reinforcement Learning

Lantao Yu, Jiaming Song, Stefano Ermon
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:7194-7201, 2019.

Abstract

Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with the ground truth rewards, while significantly outperforms prior methods in terms of policy imitation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-yu19e, title = {Multi-Agent Adversarial Inverse Reinforcement Learning}, author = {Yu, Lantao and Song, Jiaming and Ermon, Stefano}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {7194--7201}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/yu19e/yu19e.pdf}, url = {http://proceedings.mlr.press/v97/yu19e.html}, abstract = {Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with the ground truth rewards, while significantly outperforms prior methods in terms of policy imitation.} }
Endnote
%0 Conference Paper %T Multi-Agent Adversarial Inverse Reinforcement Learning %A Lantao Yu %A Jiaming Song %A Stefano Ermon %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-yu19e %I PMLR %P 7194--7201 %U http://proceedings.mlr.press/v97/yu19e.html %V 97 %X Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with the ground truth rewards, while significantly outperforms prior methods in terms of policy imitation.
APA
Yu, L., Song, J. & Ermon, S.. (2019). Multi-Agent Adversarial Inverse Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:7194-7201 Available from http://proceedings.mlr.press/v97/yu19e.html.

Related Material