[edit]
Stochastic Safe Action Model Learning
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:1715-1736, 2026.
Abstract
Hand-crafting models of interactive domains is challenging, especially when the dynamics of the domain are stochastic. We thus wish to automatically learn such models instead. In this work, we propose an algorithm to learn stochastic planning models where the distribution over the sets of effects for each action has a small support, but the sets may set values to an arbitrary number of attributes. This class captures many benchmark domains, in contrast to prior work that assumed independence of the effects on individual attributes. Our algorithm has polynomial time and sample complexity when the support size is bounded by a constant. Importantly, our method is safe in that we learn offline from example trajectories and we guarantee that actions are only permitted in states where our model of the dynamics is accurate. Moreover, we guarantee approximate completeness of the model, in the sense that if the examples are achieving goals from some distribution, then with high probability there will exist plans in our learned model that achieve goals from the same distribution