LLM-Guided Probabilistic Program Induction for POMDP Model Estimation

Aidan Curtis, Hao Tang, Thiago Veloso, Kevin Ellis, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3137-3184, 2025.

Abstract

Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial state distribution function, can be modeled as low-complexity probabilistic graphical models in the form of a short probabilistic program. Our strategy to learn these programs uses an LLM as a prior, generating candidate probabilistic programs that are then tested against the empirical distribution and adjusted through feedback. We experiment on a number of classical toy POMDP problems, simulated MiniGrid domains, and two real mobile-base robotics search domains involving partial observability. Our results show that using an LLM to guide in the construction of a low-complexity POMDP model can be more effective than tabular POMDP learning, behavior cloning, or direct LLM planning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-curtis25a, title = {LLM-Guided Probabilistic Program Induction for POMDP Model Estimation}, author = {Curtis, Aidan and Tang, Hao and Veloso, Thiago and Ellis, Kevin and Tenenbaum, Joshua B. and Lozano-P\'{e}rez, Tom\'{a}s and Kaelbling, Leslie Pack}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3137--3184}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/curtis25a/curtis25a.pdf}, url = {https://proceedings.mlr.press/v305/curtis25a.html}, abstract = {Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial state distribution function, can be modeled as low-complexity probabilistic graphical models in the form of a short probabilistic program. Our strategy to learn these programs uses an LLM as a prior, generating candidate probabilistic programs that are then tested against the empirical distribution and adjusted through feedback. We experiment on a number of classical toy POMDP problems, simulated MiniGrid domains, and two real mobile-base robotics search domains involving partial observability. Our results show that using an LLM to guide in the construction of a low-complexity POMDP model can be more effective than tabular POMDP learning, behavior cloning, or direct LLM planning.} }
Endnote
%0 Conference Paper %T LLM-Guided Probabilistic Program Induction for POMDP Model Estimation %A Aidan Curtis %A Hao Tang %A Thiago Veloso %A Kevin Ellis %A Joshua B. Tenenbaum %A Tomás Lozano-Pérez %A Leslie Pack Kaelbling %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-curtis25a %I PMLR %P 3137--3184 %U https://proceedings.mlr.press/v305/curtis25a.html %V 305 %X Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial state distribution function, can be modeled as low-complexity probabilistic graphical models in the form of a short probabilistic program. Our strategy to learn these programs uses an LLM as a prior, generating candidate probabilistic programs that are then tested against the empirical distribution and adjusted through feedback. We experiment on a number of classical toy POMDP problems, simulated MiniGrid domains, and two real mobile-base robotics search domains involving partial observability. Our results show that using an LLM to guide in the construction of a low-complexity POMDP model can be more effective than tabular POMDP learning, behavior cloning, or direct LLM planning.
APA
Curtis, A., Tang, H., Veloso, T., Ellis, K., Tenenbaum, J.B., Lozano-Pérez, T. & Kaelbling, L.P.. (2025). LLM-Guided Probabilistic Program Induction for POMDP Model Estimation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3137-3184 Available from https://proceedings.mlr.press/v305/curtis25a.html.

Related Material