SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1921-1931, 2023.
We focus on the learning of generalized neural policies for Relational Markov Decision Processes (RMDPs) expressed in RDDL. Recent work first converts the instances of a relational domain into an instance graph, and then trains a Graph Attention Network (GAT) of fixed depth with parameters shared across instances to learn a state representation, which can be decoded to get the policy [sharma et al., 22]. Unfortunately, this approach struggles to learn policies that exploit long-range dependencies – a fact we formally prove in this paper. As a remedy, we first construct a novel influence graph characterized by edges capturing one-step influence (dependence) between nodes based on the transition model. We then define influence distance between two nodes as the shortest path between them in this graph – a feature we exploit to represent long-range dependencies. We show that our architecture, referred to as Symbolic Influence Network (SymNet3.0), with its distance-based features, does not suffer from the representational issues faced by earlier approaches. Extensive experimentation demonstrates that we are competitive with existing baselines on 12 standard IPPC domains, and perform significantly better on six additional domains (including IPPC variants), designed to test a model’s capability in capturing long-range dependencies. Further analysis shows that SymNet3.0 automatically learns to focus on nodes that have key information for representing policies that capture long-range dependencies.