SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs

Vishal Sharma; Daman Arora; Mausam; Parag Singla

SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs

Vishal Sharma, Daman Arora, , Parag Singla

Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1921-1931, 2023.

Abstract

We focus on the learning of generalized neural policies for Relational Markov Decision Processes (RMDPs) expressed in RDDL. Recent work first converts the instances of a relational domain into an instance graph, and then trains a Graph Attention Network (GAT) of fixed depth with parameters shared across instances to learn a state representation, which can be decoded to get the policy [sharma et al., 22]. Unfortunately, this approach struggles to learn policies that exploit long-range dependencies – a fact we formally prove in this paper. As a remedy, we first construct a novel influence graph characterized by edges capturing one-step influence (dependence) between nodes based on the transition model. We then define influence distance between two nodes as the shortest path between them in this graph – a feature we exploit to represent long-range dependencies. We show that our architecture, referred to as Symbolic Influence Network (SymNet3.0), with its distance-based features, does not suffer from the representational issues faced by earlier approaches. Extensive experimentation demonstrates that we are competitive with existing baselines on 12 standard IPPC domains, and perform significantly better on six additional domains (including IPPC variants), designed to test a model’s capability in capturing long-range dependencies. Further analysis shows that SymNet3.0 automatically learns to focus on nodes that have key information for representing policies that capture long-range dependencies.

Cite this Paper

BibTeX


@InProceedings{pmlr-v216-sharma23c,
  title = 	 {{SymNet 3.0}: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational {MDPs}},
  author =       {Sharma, Vishal and Arora, Daman and Mausam and Singla, Parag},
  booktitle = 	 {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1921--1931},
  year = 	 {2023},
  editor = 	 {Evans, Robin J. and Shpitser, Ilya},
  volume = 	 {216},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {31 Jul--04 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v216/sharma23c/sharma23c.pdf},
  url = 	 {https://proceedings.mlr.press/v216/sharma23c.html},
  abstract = 	 {We focus on the learning of generalized neural policies for Relational Markov Decision Processes (RMDPs) expressed in RDDL. Recent work first converts the instances of a relational domain into an instance graph, and then trains a Graph Attention Network (GAT) of fixed depth with parameters shared across instances to learn a state representation, which can be decoded to get the policy [sharma et al., 22]. Unfortunately, this approach struggles to learn policies that exploit long-range dependencies – a fact we formally prove in this paper. As a remedy, we first construct a novel influence graph characterized by edges capturing one-step influence (dependence) between nodes based on the transition model. We then define influence distance between two nodes as the shortest path between them in this graph – a feature we exploit to represent long-range dependencies. We show that our architecture, referred to as Symbolic Influence Network (SymNet3.0), with its distance-based features, does not suffer from the representational issues faced by earlier approaches. Extensive experimentation demonstrates that we are competitive with existing baselines on 12 standard IPPC domains, and perform significantly better on six additional domains (including IPPC variants), designed to test a model’s capability in capturing long-range dependencies. Further analysis shows that SymNet3.0 automatically learns to focus on nodes that have key information for representing policies that capture long-range dependencies.}
}

Endnote

%0 Conference Paper
%T SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs
%A Vishal Sharma
%A Daman Arora
%A  Mausam
%A Parag Singla
%B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2023
%E Robin J. Evans
%E Ilya Shpitser	
%F pmlr-v216-sharma23c
%I PMLR
%P 1921--1931
%U https://proceedings.mlr.press/v216/sharma23c.html
%V 216
%X We focus on the learning of generalized neural policies for Relational Markov Decision Processes (RMDPs) expressed in RDDL. Recent work first converts the instances of a relational domain into an instance graph, and then trains a Graph Attention Network (GAT) of fixed depth with parameters shared across instances to learn a state representation, which can be decoded to get the policy [sharma et al., 22]. Unfortunately, this approach struggles to learn policies that exploit long-range dependencies – a fact we formally prove in this paper. As a remedy, we first construct a novel influence graph characterized by edges capturing one-step influence (dependence) between nodes based on the transition model. We then define influence distance between two nodes as the shortest path between them in this graph – a feature we exploit to represent long-range dependencies. We show that our architecture, referred to as Symbolic Influence Network (SymNet3.0), with its distance-based features, does not suffer from the representational issues faced by earlier approaches. Extensive experimentation demonstrates that we are competitive with existing baselines on 12 standard IPPC domains, and perform significantly better on six additional domains (including IPPC variants), designed to test a model’s capability in capturing long-range dependencies. Further analysis shows that SymNet3.0 automatically learns to focus on nodes that have key information for representing policies that capture long-range dependencies.

APA


Sharma, V., Arora, D., Mausam,  & Singla, P.. (2023). SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:1921-1931 Available from https://proceedings.mlr.press/v216/sharma23c.html.

SymNet 3.0: Exploiting Long-Range Influences in Learning Generalized Neural Policies for Relational MDPs

Abstract

Cite this Paper

Related Material