[edit]
Privacy-Preserving Decentralized Actor-Critic for Cooperative Multi-Agent Reinforcement Learning
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2755-2763, 2024.
Abstract
Multi-agent reinforcement learning has a wide range of applications in cooperative settings, but ensuring data privacy among agents is a significant challenge. To address this challenge, we propose Privacy-Preserving Decentralized Actor-Critic (PPDAC), an algorithm that motivates agents to cooperate while maintaining their data privacy. Leveraging trajectory ranking, PPDAC enables the agents to learn a cooperation reward that encourages agents to account for other agents’ preferences. Subsequently, each agent trains a policy that maximizes not only its local reward as in independent actor-critic (IAC) but also the cooperation reward, hence, increasing cooperation. Importantly, communication among agents is restricted to their ranking of trajectories that only include public identifiers without any private local data. Moreover, as an additional layer of privacy, the agents can perturb their rankings with the randomized response method. We evaluate PPDAC on the level-based foraging (LBF) environment and a coin-gathering environment. We compare with IAC and Shared Experience Actor-Critic (SEAC) which achieves SOTA results for the LBF environment. The results show that PPDAC consistently outperforms IAC. In addition, PPDAC outperforms SEAC in the coin-gathering environment and achieves similar performance in the LBF environment, all while providing better privacy.