Strongly Polynomial Time Complexity of Policy Iteration for $L_∞$ Robust MDPs

Ali Asadi; Krishnendu Chatterjee; Ehsan Goharshady; Mehrdad Karrabi; Alipasha Montaseri; Carlo Pagano

Strongly Polynomial Time Complexity of Policy Iteration for $L_∞$ Robust MDPs

Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano

Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:427-457, 2026.

Abstract

Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly-polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

Cite this Paper

BibTeX

@InProceedings{pmlr-v336-asadi26a,
  title = 	 {Strongly Polynomial Time Complexity of Policy Iteration for $L_∞$ Robust MDPs},
  author =       {Asadi, Ali and Chatterjee, Krishnendu and Goharshady, Ehsan and Karrabi, Mehrdad and Montaseri, Alipasha and Pagano, Carlo},
  booktitle = 	 {Proceedings of Thirty Ninth Conference on Learning Theory},
  pages = 	 {427--457},
  year = 	 {2026},
  editor = 	 {Hanneke, Steve and Lattimore, Tor},
  volume = 	 {336},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {29 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v336/main/assets/asadi26a/asadi26a.pdf},
  url = 	 {https://proceedings.mlr.press/v336/asadi26a.html},
  abstract = 	 {Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly-polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.}
}

Endnote

%0 Conference Paper
%T Strongly Polynomial Time Complexity of Policy Iteration for $L_∞$ Robust MDPs
%A Ali Asadi
%A Krishnendu Chatterjee
%A Ehsan Goharshady
%A Mehrdad Karrabi
%A Alipasha Montaseri
%A Carlo Pagano
%B Proceedings of Thirty Ninth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Steve Hanneke
%E Tor Lattimore	
%F pmlr-v336-asadi26a
%I PMLR
%P 427--457
%U https://proceedings.mlr.press/v336/asadi26a.html
%V 336
%X Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly-polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.

APA

Asadi, A., Chatterjee, K., Goharshady, E., Karrabi, M., Montaseri, A. & Pagano, C.. (2026). Strongly Polynomial Time Complexity of Policy Iteration for $L_∞$ Robust MDPs. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:427-457 Available from https://proceedings.mlr.press/v336/asadi26a.html.

Related Material

Download PDF