Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning

Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Changjie Fan, Fei Wu, Jun Xiao
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12843-12856, 2022.

Abstract

Value decomposition (VD) methods have been widely used in cooperative multi-agent reinforcement learning (MARL), where credit assignment plays an important role in guiding the agents’ decentralized execution. In this paper, we investigate VD from a novel perspective of causal inference. We first show that the environment in existing VD methods is an unobserved confounder as the common cause factor of the global state and the joint value function, which leads to the confounding bias on learning credit assignment. We then present our approach, deconfounded value decomposition (DVD), which cuts off the backdoor confounding path from the global state to the joint value function. The cut is implemented by introducing the trajectory graph, which depends only on the local trajectories, as a proxy confounder. DVD is general enough to be applied to various VD methods, and extensive experiments show that DVD can consistently achieve significant performance gains over different state-of-the-art VD methods on StarCraft II and MACO benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-li22l, title = {Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning}, author = {Li, Jiahui and Kuang, Kun and Wang, Baoxiang and Liu, Furui and Chen, Long and Fan, Changjie and Wu, Fei and Xiao, Jun}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {12843--12856}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/li22l/li22l.pdf}, url = {https://proceedings.mlr.press/v162/li22l.html}, abstract = {Value decomposition (VD) methods have been widely used in cooperative multi-agent reinforcement learning (MARL), where credit assignment plays an important role in guiding the agents’ decentralized execution. In this paper, we investigate VD from a novel perspective of causal inference. We first show that the environment in existing VD methods is an unobserved confounder as the common cause factor of the global state and the joint value function, which leads to the confounding bias on learning credit assignment. We then present our approach, deconfounded value decomposition (DVD), which cuts off the backdoor confounding path from the global state to the joint value function. The cut is implemented by introducing the trajectory graph, which depends only on the local trajectories, as a proxy confounder. DVD is general enough to be applied to various VD methods, and extensive experiments show that DVD can consistently achieve significant performance gains over different state-of-the-art VD methods on StarCraft II and MACO benchmarks.} }
Endnote
%0 Conference Paper %T Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning %A Jiahui Li %A Kun Kuang %A Baoxiang Wang %A Furui Liu %A Long Chen %A Changjie Fan %A Fei Wu %A Jun Xiao %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-li22l %I PMLR %P 12843--12856 %U https://proceedings.mlr.press/v162/li22l.html %V 162 %X Value decomposition (VD) methods have been widely used in cooperative multi-agent reinforcement learning (MARL), where credit assignment plays an important role in guiding the agents’ decentralized execution. In this paper, we investigate VD from a novel perspective of causal inference. We first show that the environment in existing VD methods is an unobserved confounder as the common cause factor of the global state and the joint value function, which leads to the confounding bias on learning credit assignment. We then present our approach, deconfounded value decomposition (DVD), which cuts off the backdoor confounding path from the global state to the joint value function. The cut is implemented by introducing the trajectory graph, which depends only on the local trajectories, as a proxy confounder. DVD is general enough to be applied to various VD methods, and extensive experiments show that DVD can consistently achieve significant performance gains over different state-of-the-art VD methods on StarCraft II and MACO benchmarks.
APA
Li, J., Kuang, K., Wang, B., Liu, F., Chen, L., Fan, C., Wu, F. & Xiao, J.. (2022). Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12843-12856 Available from https://proceedings.mlr.press/v162/li22l.html.

Related Material