Efficient Value Iteration for s-rectangular Robust Markov Decision Processes

Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:25682-25725, 2024.

Abstract

We focus on s-rectangular robust Markov decision processes (MDPs), which capture interconnected uncertainties across different actions within each state. This framework is more general compared to sa-rectangular robust MDPs, where uncertainties in each action are independent. However, the introduced interdependence significantly amplifies the complexity of the problem. Existing methods either have slow performance guarantees or are inapplicable to even moderately large state spaces. In this work, we derive optimal robust Bellman operators in explicit forms. This leads to robust value iteration methods with significantly faster time complexities than existing approaches, which can be used in large state spaces. Further, our findings reveal that the optimal policies demonstrate a novel threshold behavior, selectively favoring a limited set of actions based on their respective advantage functions. Additionally, our study uncovers a noteworthy connection between the robustness of a policy and the variance in its value function, highlighting that policies with lower variance exhibit greater resilience.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kumar24b, title = {Efficient Value Iteration for s-rectangular Robust {M}arkov Decision Processes}, author = {Kumar, Navdeep and Wang, Kaixin and Levy, Kfir Yehuda and Mannor, Shie}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {25682--25725}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kumar24b/kumar24b.pdf}, url = {https://proceedings.mlr.press/v235/kumar24b.html}, abstract = {We focus on s-rectangular robust Markov decision processes (MDPs), which capture interconnected uncertainties across different actions within each state. This framework is more general compared to sa-rectangular robust MDPs, where uncertainties in each action are independent. However, the introduced interdependence significantly amplifies the complexity of the problem. Existing methods either have slow performance guarantees or are inapplicable to even moderately large state spaces. In this work, we derive optimal robust Bellman operators in explicit forms. This leads to robust value iteration methods with significantly faster time complexities than existing approaches, which can be used in large state spaces. Further, our findings reveal that the optimal policies demonstrate a novel threshold behavior, selectively favoring a limited set of actions based on their respective advantage functions. Additionally, our study uncovers a noteworthy connection between the robustness of a policy and the variance in its value function, highlighting that policies with lower variance exhibit greater resilience.} }
Endnote
%0 Conference Paper %T Efficient Value Iteration for s-rectangular Robust Markov Decision Processes %A Navdeep Kumar %A Kaixin Wang %A Kfir Yehuda Levy %A Shie Mannor %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kumar24b %I PMLR %P 25682--25725 %U https://proceedings.mlr.press/v235/kumar24b.html %V 235 %X We focus on s-rectangular robust Markov decision processes (MDPs), which capture interconnected uncertainties across different actions within each state. This framework is more general compared to sa-rectangular robust MDPs, where uncertainties in each action are independent. However, the introduced interdependence significantly amplifies the complexity of the problem. Existing methods either have slow performance guarantees or are inapplicable to even moderately large state spaces. In this work, we derive optimal robust Bellman operators in explicit forms. This leads to robust value iteration methods with significantly faster time complexities than existing approaches, which can be used in large state spaces. Further, our findings reveal that the optimal policies demonstrate a novel threshold behavior, selectively favoring a limited set of actions based on their respective advantage functions. Additionally, our study uncovers a noteworthy connection between the robustness of a policy and the variance in its value function, highlighting that policies with lower variance exhibit greater resilience.
APA
Kumar, N., Wang, K., Levy, K.Y. & Mannor, S.. (2024). Efficient Value Iteration for s-rectangular Robust Markov Decision Processes. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:25682-25725 Available from https://proceedings.mlr.press/v235/kumar24b.html.

Related Material