An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning

Woojun Kim, Youngchul Sung
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:16829-16852, 2023.

Abstract

In this paper, we propose an adaptive entropy-regularization framework (ADER) for multi-agent reinforcement learning (RL) to learn the adequate amount of exploration of each agent for entropy-based exploration. In order to derive a metric for the proper level of exploration entropy for each agent, we disentangle the soft value function into two types: one for pure return and the other for entropy. By applying multi-agent value factorization to the disentangled value function of pure return, we obtain a metric to determine the relevant level of exploration entropy for each agent, given by the partial derivative of the pure-return value function with respect to (w.r.t.) the policy entropy of each agent. Based on this metric, we propose the ADER algorithm based on maximum entropy RL, which controls the necessary level of exploration across agents over time by learning the proper target entropy for each agent. Experimental results show that the proposed scheme significantly outperforms current state-of-the-art multi-agent RL algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-kim23v, title = {An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning}, author = {Kim, Woojun and Sung, Youngchul}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {16829--16852}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/kim23v/kim23v.pdf}, url = {https://proceedings.mlr.press/v202/kim23v.html}, abstract = {In this paper, we propose an adaptive entropy-regularization framework (ADER) for multi-agent reinforcement learning (RL) to learn the adequate amount of exploration of each agent for entropy-based exploration. In order to derive a metric for the proper level of exploration entropy for each agent, we disentangle the soft value function into two types: one for pure return and the other for entropy. By applying multi-agent value factorization to the disentangled value function of pure return, we obtain a metric to determine the relevant level of exploration entropy for each agent, given by the partial derivative of the pure-return value function with respect to (w.r.t.) the policy entropy of each agent. Based on this metric, we propose the ADER algorithm based on maximum entropy RL, which controls the necessary level of exploration across agents over time by learning the proper target entropy for each agent. Experimental results show that the proposed scheme significantly outperforms current state-of-the-art multi-agent RL algorithms.} }
Endnote
%0 Conference Paper %T An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning %A Woojun Kim %A Youngchul Sung %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-kim23v %I PMLR %P 16829--16852 %U https://proceedings.mlr.press/v202/kim23v.html %V 202 %X In this paper, we propose an adaptive entropy-regularization framework (ADER) for multi-agent reinforcement learning (RL) to learn the adequate amount of exploration of each agent for entropy-based exploration. In order to derive a metric for the proper level of exploration entropy for each agent, we disentangle the soft value function into two types: one for pure return and the other for entropy. By applying multi-agent value factorization to the disentangled value function of pure return, we obtain a metric to determine the relevant level of exploration entropy for each agent, given by the partial derivative of the pure-return value function with respect to (w.r.t.) the policy entropy of each agent. Based on this metric, we propose the ADER algorithm based on maximum entropy RL, which controls the necessary level of exploration across agents over time by learning the proper target entropy for each agent. Experimental results show that the proposed scheme significantly outperforms current state-of-the-art multi-agent RL algorithms.
APA
Kim, W. & Sung, Y.. (2023). An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:16829-16852 Available from https://proceedings.mlr.press/v202/kim23v.html.

Related Material