Differentiable Synthesis of Behavior Tree Architectures and Execution Nodes

Yu Huang, Ziji Wu, Kexin Ma, Ji Wang
Proceedings of the International Conference on Neuro-symbolic Systems, PMLR 288:231-259, 2025.

Abstract

Deep reinforcement learning (DRL) has achieved remarkable success in solving complex control tasks. However, neural network policies often lack interpretability and struggle to generalize to new scenarios without further training. Behavior trees (BTs) offer a more interpretable policy representation, making them a promising alternative. Yet, the automatic synthesis of BTs remains a challenge due to the discrete search space and the need to adapt to diverse scenarios. Prior works often come at the cost of fixed or constrained architectures, or rely on customized execution nodes. We propose an end-to-end synthesis framework that simultaneously generates the architectures and execution nodes of BTs solely from environment rewards. We first conduct architecture search on top of a continuous relaxation of the architecture search space derived from a given grammar. To tackle the discrete execution mechanism and non-differentiable semantics of BTs, we redefine the execution mechanism and interpret the semantics in terms of a differentiable approximation. We also propose an efficient extraction algorithm that leverages the fallback structure of BTs to instantiate a valid BT architecture. This algorithm recovers the performance damaged by the co-adaptation and continuous approximation. Experimental results show the superior performance and generalization of our synthesized BTs, demonstrating the efficacy of the proposed framework.

Cite this Paper


BibTeX
@InProceedings{pmlr-v288-huang25a, title = {Differentiable Synthesis of Behavior Tree Architectures and Execution Nodes}, author = {Huang, Yu and Wu, Ziji and Ma, Kexin and Wang, Ji}, booktitle = {Proceedings of the International Conference on Neuro-symbolic Systems}, pages = {231--259}, year = {2025}, editor = {Pappas, George and Ravikumar, Pradeep and Seshia, Sanjit A.}, volume = {288}, series = {Proceedings of Machine Learning Research}, month = {28--30 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v288/main/assets/huang25a/huang25a.pdf}, url = {https://proceedings.mlr.press/v288/huang25a.html}, abstract = {Deep reinforcement learning (DRL) has achieved remarkable success in solving complex control tasks. However, neural network policies often lack interpretability and struggle to generalize to new scenarios without further training. Behavior trees (BTs) offer a more interpretable policy representation, making them a promising alternative. Yet, the automatic synthesis of BTs remains a challenge due to the discrete search space and the need to adapt to diverse scenarios. Prior works often come at the cost of fixed or constrained architectures, or rely on customized execution nodes. We propose an end-to-end synthesis framework that simultaneously generates the architectures and execution nodes of BTs solely from environment rewards. We first conduct architecture search on top of a continuous relaxation of the architecture search space derived from a given grammar. To tackle the discrete execution mechanism and non-differentiable semantics of BTs, we redefine the execution mechanism and interpret the semantics in terms of a differentiable approximation. We also propose an efficient extraction algorithm that leverages the fallback structure of BTs to instantiate a valid BT architecture. This algorithm recovers the performance damaged by the co-adaptation and continuous approximation. Experimental results show the superior performance and generalization of our synthesized BTs, demonstrating the efficacy of the proposed framework.} }
Endnote
%0 Conference Paper %T Differentiable Synthesis of Behavior Tree Architectures and Execution Nodes %A Yu Huang %A Ziji Wu %A Kexin Ma %A Ji Wang %B Proceedings of the International Conference on Neuro-symbolic Systems %C Proceedings of Machine Learning Research %D 2025 %E George Pappas %E Pradeep Ravikumar %E Sanjit A. Seshia %F pmlr-v288-huang25a %I PMLR %P 231--259 %U https://proceedings.mlr.press/v288/huang25a.html %V 288 %X Deep reinforcement learning (DRL) has achieved remarkable success in solving complex control tasks. However, neural network policies often lack interpretability and struggle to generalize to new scenarios without further training. Behavior trees (BTs) offer a more interpretable policy representation, making them a promising alternative. Yet, the automatic synthesis of BTs remains a challenge due to the discrete search space and the need to adapt to diverse scenarios. Prior works often come at the cost of fixed or constrained architectures, or rely on customized execution nodes. We propose an end-to-end synthesis framework that simultaneously generates the architectures and execution nodes of BTs solely from environment rewards. We first conduct architecture search on top of a continuous relaxation of the architecture search space derived from a given grammar. To tackle the discrete execution mechanism and non-differentiable semantics of BTs, we redefine the execution mechanism and interpret the semantics in terms of a differentiable approximation. We also propose an efficient extraction algorithm that leverages the fallback structure of BTs to instantiate a valid BT architecture. This algorithm recovers the performance damaged by the co-adaptation and continuous approximation. Experimental results show the superior performance and generalization of our synthesized BTs, demonstrating the efficacy of the proposed framework.
APA
Huang, Y., Wu, Z., Ma, K. & Wang, J.. (2025). Differentiable Synthesis of Behavior Tree Architectures and Execution Nodes. Proceedings of the International Conference on Neuro-symbolic Systems, in Proceedings of Machine Learning Research 288:231-259 Available from https://proceedings.mlr.press/v288/huang25a.html.

Related Material