[edit]
Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:59-73, 2022.
Abstract
Solving partially observable Markov decision processes (POMDPs) is critical when applying reinforcement learning to real-world problems, where agents have an incomplete view of the world. Recurrent neural networks (RNNs) are the defacto approach for solving POMDPs in reinforcement learning (RL). Although they perform well in supervised learning, noisy gradients reduce their capabilities in RL. Furthermore, they cannot utilize prior human knowledge to bootstrap or stabilize learning. This leads researchers to hand-design task-specific memory models based on their prior knowledge of the task at hand. In this paper, we present graph convolutional memory (GCM), the first RL memory framework with swappable task-specific priors, enabling users to inject expertise into their models. GCM uses human-defined topological priors to form graph neighborhoods, combining them into a larger network topology. We query the graph using graph convolution, coalescing relevant memories into a context-dependent summary of the past. Results demonstrate that GCM outperforms state of the art methods on control, memorization, and navigation tasks while using fewer parameters.