Predictive Safety Network for Resource-constrained Multi-agent Systems
Proceedings of the Conference on Robot Learning, PMLR 100:283-292, 2020.
Coordinating multiple agents, such as mobile robots, with shared resources, such as common battery charging stations, is a highly relevant but still challenging decision problem. Traditionally, the motion and task planning of multi-agent systems are tackled by either designing ad-hoc decision rules or employing optimization tools. The former requires intensive manual tuning while the latter needs a static and accurate model of the complete system. Both approaches are prone to uncertainties in the robot motion and task execution. In this work, we propose a novel planning framework based on recent advances in deep reinforcement learning. The framework combines a centralized safety policy that acts on direct predictions of future resource levels and a decentralized task policy that optimizes task completions. The safety network is trained using supervised learning without extraneous supervision, while the task policy is trained using concurrent self-play. The whole framework follows a hierarchical structure to avoid the exponential blowup in the state and action space. We demonstrate significant improvements in a practical logistic planning problem for warehouse robots, compared with heuristic solutions, optimization tools and other reinforcement learning methods.