Difference Advantage Estimation for Multi-Agent Policy Gradients

Yueheng Li, Guangming Xie, Zongqing Lu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:13066-13085, 2022.

Abstract

Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-li22w, title = {Difference Advantage Estimation for Multi-Agent Policy Gradients}, author = {Li, Yueheng and Xie, Guangming and Lu, Zongqing}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {13066--13085}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/li22w/li22w.pdf}, url = {https://proceedings.mlr.press/v162/li22w.html}, abstract = {Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.} }
Endnote
%0 Conference Paper %T Difference Advantage Estimation for Multi-Agent Policy Gradients %A Yueheng Li %A Guangming Xie %A Zongqing Lu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-li22w %I PMLR %P 13066--13085 %U https://proceedings.mlr.press/v162/li22w.html %V 162 %X Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.
APA
Li, Y., Xie, G. & Lu, Z.. (2022). Difference Advantage Estimation for Multi-Agent Policy Gradients. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:13066-13085 Available from https://proceedings.mlr.press/v162/li22w.html.

Related Material