Recursively-Constrained Partially Observable Markov Decision Processes

Qi Heng Ho, Tyler Becker, Benjamin Kraske, Zakariya Laouar, Martin Feather, Federico Rossi, Morteza Lahijanian, Zachary Sunberg
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1658-1680, 2024.

Abstract

Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman’s principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-ho24a, title = {Recursively-Constrained Partially Observable Markov Decision Processes}, author = {Ho, Qi Heng and Becker, Tyler and Kraske, Benjamin and Laouar, Zakariya and Feather, Martin and Rossi, Federico and Lahijanian, Morteza and Sunberg, Zachary}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {1658--1680}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/ho24a/ho24a.pdf}, url = {https://proceedings.mlr.press/v244/ho24a.html}, abstract = {Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman’s principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.} }
Endnote
%0 Conference Paper %T Recursively-Constrained Partially Observable Markov Decision Processes %A Qi Heng Ho %A Tyler Becker %A Benjamin Kraske %A Zakariya Laouar %A Martin Feather %A Federico Rossi %A Morteza Lahijanian %A Zachary Sunberg %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-ho24a %I PMLR %P 1658--1680 %U https://proceedings.mlr.press/v244/ho24a.html %V 244 %X Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman’s principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.
APA
Ho, Q.H., Becker, T., Kraske, B., Laouar, Z., Feather, M., Rossi, F., Lahijanian, M. & Sunberg, Z.. (2024). Recursively-Constrained Partially Observable Markov Decision Processes. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:1658-1680 Available from https://proceedings.mlr.press/v244/ho24a.html.

Related Material