[edit]
Neurosymbolic World Models for Sequential Decision Making
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:23047-23062, 2025.
Abstract
We present Structured World Modeling for Policy Optimization (SWMPO), a framework for unsupervised learning of neurosymbolic Finite State Machines (FSM) that capture environmental structure for policy optimization. Traditional unsupervised world modeling methods rely on unstructured representations, such as neural networks, that do not explicitly represent high-level patterns within the system (e.g., patterns in the dynamics of regions such as water and land). Instead, SWMPO models the environment as a finite state machine (FSM), where each state corresponds to a specific region with distinct dynamics. This structured representation can then be leveraged for tasks like policy optimization. Previous works that synthesize FSMs for this purpose have been limited to discrete spaces, not continuous spaces. Instead, our proposed FSM synthesis algorithm operates in an unsupervised manner, leveraging low-level features from unprocessed, non-visual data, making it adaptable across various domains. The synthesized FSM models are expressive enough to be used in a model-based Reinforcement Learning scheme that leverages offline data to efficiently synthesize environment-specific world models. We demonstrate the advantages of SWMPO by benchmarking its environment modeling capabilities in simulated environments.