Stochastic Safe Action Model Learning

Zihao Deng; Brendan Juba

Stochastic Safe Action Model Learning

Zihao Deng, Brendan Juba

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:1715-1736, 2026.

Abstract

Hand-crafting models of interactive domains is challenging, especially when the dynamics of the domain are stochastic. We thus wish to automatically learn such models instead. In this work, we propose an algorithm to learn stochastic planning models where the distribution over the sets of effects for each action has a small support, but the sets may set values to an arbitrary number of attributes. This class captures many benchmark domains, in contrast to prior work that assumed independence of the effects on individual attributes. Our algorithm has polynomial time and sample complexity when the support size is bounded by a constant. Importantly, our method is safe in that we learn offline from example trajectories and we guarantee that actions are only permitted in states where our model of the dynamics is accurate. Moreover, we guarantee approximate completeness of the model, in the sense that if the examples are achieving goals from some distribution, then with high probability there will exist plans in our learned model that achieve goals from the same distribution

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-deng26a,
  title = 	 {Stochastic Safe Action Model Learning},
  author =       {Deng, Zihao and Juba, Brendan},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {1715--1736},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/deng26a/deng26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/deng26a.html},
  abstract = 	 {Hand-crafting models of interactive domains is challenging, especially when the dynamics of the domain are stochastic. We thus wish to automatically learn such models instead. In this work, we propose an algorithm to learn stochastic planning models where the distribution over the sets of effects for each action has a small support, but the sets may set values to an arbitrary number of attributes. This class captures many benchmark domains, in contrast to prior work that assumed independence of the effects on individual attributes. Our algorithm has polynomial time and sample complexity when the support size is bounded by a constant. Importantly, our method is safe in that we learn offline from example trajectories and we guarantee that actions are only permitted in states where our model of the dynamics is accurate. Moreover, we guarantee approximate completeness of the model, in the sense that if the examples are achieving goals from some distribution, then with high probability there will exist plans in our learned model that achieve goals from the same distribution}
}

Endnote

%0 Conference Paper
%T Stochastic Safe Action Model Learning
%A Zihao Deng
%A Brendan Juba
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-deng26a
%I PMLR
%P 1715--1736
%U https://proceedings.mlr.press/v336/deng26a.html
%V 336
%X Hand-crafting models of interactive domains is challenging, especially when the dynamics of the domain are stochastic. We thus wish to automatically learn such models instead. In this work, we propose an algorithm to learn stochastic planning models where the distribution over the sets of effects for each action has a small support, but the sets may set values to an arbitrary number of attributes. This class captures many benchmark domains, in contrast to prior work that assumed independence of the effects on individual attributes. Our algorithm has polynomial time and sample complexity when the support size is bounded by a constant. Importantly, our method is safe in that we learn offline from example trajectories and we guarantee that actions are only permitted in states where our model of the dynamics is accurate. Moreover, we guarantee approximate completeness of the model, in the sense that if the examples are achieving goals from some distribution, then with high probability there will exist plans in our learned model that achieve goals from the same distribution

APA

Deng, Z. & Juba, B.. (2026). Stochastic Safe Action Model Learning. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:1715-1736 Available from https://proceedings.mlr.press/v336/deng26a.html.

Stochastic Safe Action Model Learning

Abstract

Cite this Paper

Related Material