Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Uri Gadot, Kaixin Wang, Navdeep Kumar, Kfir Yehuda Levy, Shie Mannor
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14408-14432, 2024.

Abstract

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf non-robust RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gadot24a, title = {Bring Your Own ({N}on-Robust) Algorithm to Solve Robust {MDP}s by Estimating The Worst Kernel}, author = {Gadot, Uri and Wang, Kaixin and Kumar, Navdeep and Levy, Kfir Yehuda and Mannor, Shie}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {14408--14432}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gadot24a/gadot24a.pdf}, url = {https://proceedings.mlr.press/v235/gadot24a.html}, abstract = {Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf non-robust RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.} }
Endnote
%0 Conference Paper %T Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel %A Uri Gadot %A Kaixin Wang %A Navdeep Kumar %A Kfir Yehuda Levy %A Shie Mannor %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gadot24a %I PMLR %P 14408--14432 %U https://proceedings.mlr.press/v235/gadot24a.html %V 235 %X Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf non-robust RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.
APA
Gadot, U., Wang, K., Kumar, N., Levy, K.Y. & Mannor, S.. (2024). Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14408-14432 Available from https://proceedings.mlr.press/v235/gadot24a.html.

Related Material