Difference Advantage Estimation for Multi-Agent Policy Gradients

Yueheng Li; Guangming Xie; Zongqing Lu

Difference Advantage Estimation for Multi-Agent Policy Gradients

Yueheng Li, Guangming Xie, Zongqing Lu

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:13066-13085, 2022.

Abstract

Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-li22w,
  title = 	 {Difference Advantage Estimation for Multi-Agent Policy Gradients},
  author =       {Li, Yueheng and Xie, Guangming and Lu, Zongqing},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {13066--13085},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22w/li22w.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22w.html},
  abstract = 	 {Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.}
}

Endnote

%0 Conference Paper
%T Difference Advantage Estimation for Multi-Agent Policy Gradients
%A Yueheng Li
%A Guangming Xie
%A Zongqing Lu
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-li22w
%I PMLR
%P 13066--13085
%U https://proceedings.mlr.press/v162/li22w.html
%V 162
%X Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.

APA


Li, Y., Xie, G. & Lu, Z.. (2022). Difference Advantage Estimation for Multi-Agent Policy Gradients. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:13066-13085 Available from https://proceedings.mlr.press/v162/li22w.html.

Related Material

Download PDF