Fast Bellman Updates for Robust MDPs

Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1979-1988, 2018.

Abstract

We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s,a-rectangular ambiguity sets. It runs in quasi-linear time for plain L1-norms and also generalizes to weighted L1-norms. The second algorithm uses bisection to compute updates for robust MDPs with s-rectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasi-linear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-ho18a, title = {Fast {B}ellman Updates for Robust {MDP}s}, author = {Ho, Chin Pang and Petrik, Marek and Wiesemann, Wolfram}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1979--1988}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/ho18a/ho18a.pdf}, url = {https://proceedings.mlr.press/v80/ho18a.html}, abstract = {We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s,a-rectangular ambiguity sets. It runs in quasi-linear time for plain L1-norms and also generalizes to weighted L1-norms. The second algorithm uses bisection to compute updates for robust MDPs with s-rectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasi-linear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.} }
Endnote
%0 Conference Paper %T Fast Bellman Updates for Robust MDPs %A Chin Pang Ho %A Marek Petrik %A Wolfram Wiesemann %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-ho18a %I PMLR %P 1979--1988 %U https://proceedings.mlr.press/v80/ho18a.html %V 80 %X We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s,a-rectangular ambiguity sets. It runs in quasi-linear time for plain L1-norms and also generalizes to weighted L1-norms. The second algorithm uses bisection to compute updates for robust MDPs with s-rectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasi-linear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.
APA
Ho, C.P., Petrik, M. & Wiesemann, W.. (2018). Fast Bellman Updates for Robust MDPs. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1979-1988 Available from https://proceedings.mlr.press/v80/ho18a.html.

Related Material