Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Lipeng Wan; Zeyang Liu; Xingyu Chen; Xuguang Lan; Nanning Zheng

Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

Lipeng Wan, Zeyang Liu, Xingyu Chen, Xuguang Lan, Nanning Zheng

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:22512-22535, 2022.

Abstract

Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the best team performance). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure the optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and eliminates the non-optimal STNs via superior experience replay. Theoretical proofs and empirical results demonstrate that given the true Q values, GVR ensures the optimal consistency under sufficient exploration. Besides, in tasks where the true Q values are unavailable, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-wan22c,
  title = 	 {Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning},
  author =       {Wan, Lipeng and Liu, Zeyang and Chen, Xingyu and Lan, Xuguang and Zheng, Nanning},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {22512--22535},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/wan22c/wan22c.pdf},
  url = 	 {https://proceedings.mlr.press/v162/wan22c.html},
  abstract = 	 {Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the best team performance). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure the optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and eliminates the non-optimal STNs via superior experience replay. Theoretical proofs and empirical results demonstrate that given the true Q values, GVR ensures the optimal consistency under sufficient exploration. Besides, in tasks where the true Q values are unavailable, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks.}
}

Endnote

%0 Conference Paper
%T Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning
%A Lipeng Wan
%A Zeyang Liu
%A Xingyu Chen
%A Xuguang Lan
%A Nanning Zheng
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-wan22c
%I PMLR
%P 22512--22535
%U https://proceedings.mlr.press/v162/wan22c.html
%V 162
%X Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the best team performance). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure the optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and eliminates the non-optimal STNs via superior experience replay. Theoretical proofs and empirical results demonstrate that given the true Q values, GVR ensures the optimal consistency under sufficient exploration. Besides, in tasks where the true Q values are unavailable, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks.

APA


Wan, L., Liu, Z., Chen, X., Lan, X. & Zheng, N.. (2022). Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:22512-22535 Available from https://proceedings.mlr.press/v162/wan22c.html.

Related Material

Download PDF