Individual Reward Assisted Multi-Agent Reinforcement Learning

Li Wang, Yupeng Zhang, Yujing Hu, Weixun Wang, Chongjie Zhang, Yang Gao, Jianye Hao, Tangjie Lv, Changjie Fan
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23417-23432, 2022.

Abstract

In many real-world multi-agent systems, the sparsity of team rewards often makes it difficult for an algorithm to successfully learn a cooperative team policy. At present, the common way for solving this problem is to design some dense individual rewards for the agents to guide the cooperation. However, most existing works utilize individual rewards in ways that do not always promote teamwork and sometimes are even counterproductive. In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-wang22ao, title = {Individual Reward Assisted Multi-Agent Reinforcement Learning}, author = {Wang, Li and Zhang, Yupeng and Hu, Yujing and Wang, Weixun and Zhang, Chongjie and Gao, Yang and Hao, Jianye and Lv, Tangjie and Fan, Changjie}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {23417--23432}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/wang22ao/wang22ao.pdf}, url = {https://proceedings.mlr.press/v162/wang22ao.html}, abstract = {In many real-world multi-agent systems, the sparsity of team rewards often makes it difficult for an algorithm to successfully learn a cooperative team policy. At present, the common way for solving this problem is to design some dense individual rewards for the agents to guide the cooperation. However, most existing works utilize individual rewards in ways that do not always promote teamwork and sometimes are even counterproductive. In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.} }
Endnote
%0 Conference Paper %T Individual Reward Assisted Multi-Agent Reinforcement Learning %A Li Wang %A Yupeng Zhang %A Yujing Hu %A Weixun Wang %A Chongjie Zhang %A Yang Gao %A Jianye Hao %A Tangjie Lv %A Changjie Fan %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-wang22ao %I PMLR %P 23417--23432 %U https://proceedings.mlr.press/v162/wang22ao.html %V 162 %X In many real-world multi-agent systems, the sparsity of team rewards often makes it difficult for an algorithm to successfully learn a cooperative team policy. At present, the common way for solving this problem is to design some dense individual rewards for the agents to guide the cooperation. However, most existing works utilize individual rewards in ways that do not always promote teamwork and sometimes are even counterproductive. In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.
APA
Wang, L., Zhang, Y., Hu, Y., Wang, W., Zhang, C., Gao, Y., Hao, J., Lv, T. & Fan, C.. (2022). Individual Reward Assisted Multi-Agent Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:23417-23432 Available from https://proceedings.mlr.press/v162/wang22ao.html.

Related Material