Neuro-Symbolic Inverse Constrained Reinforcement Learning

Oliver Deane, Oliver Ray
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, PMLR 284:913-925, 2025.

Abstract

Inverse Constrained Reinforcement Learning (ICRL) is an established field of policy learning that augments reward-driven exploratory optimisation with example-driven constraint inference aimed at exploiting limited observations of expert behaviour. This paper proposes a generalisation of ICRL that employs weighted constraints to better support lifelong learning and to handle domains with potentially conflicting social norms. We introduce a Neuro-Symbolic ICRL approach (NSICRL) with two key components: a symbolic system based on Inductive Logic Programming (ILP) that infers first-order constraints which are human-interpretable and generalise across environment configurations; and a neural system based on Deep Q learning (DQL) that efficiently learns near-optimal policies subject to those constraints. By weighting the high-level ILP constraints (based on the order in which they are learnt) and encoding them as low-level state-action penalties in the DQL reward function, we effectively allow earlier constraints to be overridden by later ones. Unlike prior work in ICRL, our approach is able to continue working when exposed to newly encountered expert behaviours that reveal more nuanced exceptions to previously learnt constraints. We evaluate NSICRL in a simulated traffic domain, which shows how it outperforms existing methods in terms of efficiency and accuracy when learning hard constraints; and which also shows the utility of learning defeasible norms in an ICRL context. To the best of our knowledge, this is the first approach that places equal emphasis on exploratory and imitative learning while also being able to infer defeasible norms in an interpretable way that scales to non-trivial examples.

Cite this Paper


BibTeX
@InProceedings{pmlr-v284-deane25a, title = {Neuro-Symbolic Inverse Constrained Reinforcement Learning}, author = {Deane, Oliver and Ray, Oliver}, booktitle = {Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning}, pages = {913--925}, year = {2025}, editor = {H. Gilpin, Leilani and Giunchiglia, Eleonora and Hitzler, Pascal and van Krieken, Emile}, volume = {284}, series = {Proceedings of Machine Learning Research}, month = {08--10 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v284/main/assets/deane25a/deane25a.pdf}, url = {https://proceedings.mlr.press/v284/deane25a.html}, abstract = {Inverse Constrained Reinforcement Learning (ICRL) is an established field of policy learning that augments reward-driven exploratory optimisation with example-driven constraint inference aimed at exploiting limited observations of expert behaviour. This paper proposes a generalisation of ICRL that employs weighted constraints to better support lifelong learning and to handle domains with potentially conflicting social norms. We introduce a Neuro-Symbolic ICRL approach (NSICRL) with two key components: a symbolic system based on Inductive Logic Programming (ILP) that infers first-order constraints which are human-interpretable and generalise across environment configurations; and a neural system based on Deep Q learning (DQL) that efficiently learns near-optimal policies subject to those constraints. By weighting the high-level ILP constraints (based on the order in which they are learnt) and encoding them as low-level state-action penalties in the DQL reward function, we effectively allow earlier constraints to be overridden by later ones. Unlike prior work in ICRL, our approach is able to continue working when exposed to newly encountered expert behaviours that reveal more nuanced exceptions to previously learnt constraints. We evaluate NSICRL in a simulated traffic domain, which shows how it outperforms existing methods in terms of efficiency and accuracy when learning hard constraints; and which also shows the utility of learning defeasible norms in an ICRL context. To the best of our knowledge, this is the first approach that places equal emphasis on exploratory and imitative learning while also being able to infer defeasible norms in an interpretable way that scales to non-trivial examples.} }
Endnote
%0 Conference Paper %T Neuro-Symbolic Inverse Constrained Reinforcement Learning %A Oliver Deane %A Oliver Ray %B Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning %C Proceedings of Machine Learning Research %D 2025 %E Leilani H. Gilpin %E Eleonora Giunchiglia %E Pascal Hitzler %E Emile van Krieken %F pmlr-v284-deane25a %I PMLR %P 913--925 %U https://proceedings.mlr.press/v284/deane25a.html %V 284 %X Inverse Constrained Reinforcement Learning (ICRL) is an established field of policy learning that augments reward-driven exploratory optimisation with example-driven constraint inference aimed at exploiting limited observations of expert behaviour. This paper proposes a generalisation of ICRL that employs weighted constraints to better support lifelong learning and to handle domains with potentially conflicting social norms. We introduce a Neuro-Symbolic ICRL approach (NSICRL) with two key components: a symbolic system based on Inductive Logic Programming (ILP) that infers first-order constraints which are human-interpretable and generalise across environment configurations; and a neural system based on Deep Q learning (DQL) that efficiently learns near-optimal policies subject to those constraints. By weighting the high-level ILP constraints (based on the order in which they are learnt) and encoding them as low-level state-action penalties in the DQL reward function, we effectively allow earlier constraints to be overridden by later ones. Unlike prior work in ICRL, our approach is able to continue working when exposed to newly encountered expert behaviours that reveal more nuanced exceptions to previously learnt constraints. We evaluate NSICRL in a simulated traffic domain, which shows how it outperforms existing methods in terms of efficiency and accuracy when learning hard constraints; and which also shows the utility of learning defeasible norms in an ICRL context. To the best of our knowledge, this is the first approach that places equal emphasis on exploratory and imitative learning while also being able to infer defeasible norms in an interpretable way that scales to non-trivial examples.
APA
Deane, O. & Ray, O.. (2025). Neuro-Symbolic Inverse Constrained Reinforcement Learning. Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, in Proceedings of Machine Learning Research 284:913-925 Available from https://proceedings.mlr.press/v284/deane25a.html.

Related Material