Fast Bellman Updates for Robust MDPs
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:19791988, 2018.
Abstract
We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1constrained s,arectangular ambiguity sets. It runs in quasilinear time for plain L1norms and also generalizes to weighted L1norms. The second algorithm uses bisection to compute updates for robust MDPs with srectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasilinear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a stateoftheart commercial optimization package, for small instances, and the performance gap grows considerably with problem size.
Related Material


