Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Jihwan Jeong, Xiaoyu Wang, Jingmin Wang, Scott Sanner, Pascal Poupart
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:27113-27133, 2025.

Abstract

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jeong25a, title = {Reflect-then-Plan: Offline Model-Based Planning through a Doubly {B}ayesian Lens}, author = {Jeong, Jihwan and Wang, Xiaoyu and Wang, Jingmin and Sanner, Scott and Poupart, Pascal}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {27113--27133}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jeong25a/jeong25a.pdf}, url = {https://proceedings.mlr.press/v267/jeong25a.html}, abstract = {Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.} }
Endnote
%0 Conference Paper %T Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens %A Jihwan Jeong %A Xiaoyu Wang %A Jingmin Wang %A Scott Sanner %A Pascal Poupart %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jeong25a %I PMLR %P 27113--27133 %U https://proceedings.mlr.press/v267/jeong25a.html %V 267 %X Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.
APA
Jeong, J., Wang, X., Wang, J., Sanner, S. & Poupart, P.. (2025). Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:27113-27133 Available from https://proceedings.mlr.press/v267/jeong25a.html.

Related Material