Randomization for Faster Exact Optimization of Discounted Markov Decision Processes

Andrei Graur, Aaron Sidford, Ta-Wei Tu
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:2878-2900, 2026.

Abstract

We provide faster running times for exactly solving discounted Markov Decision Processes (DMDPs) in strongly polynomial time. We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in approximately solving DMDPs, yield our results.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-graur26a, title = {Randomization for Faster Exact Optimization of Discounted Markov Decision Processes}, author = {Graur, Andrei and Sidford, Aaron and Tu, Ta-Wei}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {2878--2900}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/graur26a/graur26a.pdf}, url = {https://proceedings.mlr.press/v336/graur26a.html}, abstract = {We provide faster running times for exactly solving discounted Markov Decision Processes (DMDPs) in strongly polynomial time. We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in approximately solving DMDPs, yield our results.} }
Endnote
%0 Conference Paper %T Randomization for Faster Exact Optimization of Discounted Markov Decision Processes %A Andrei Graur %A Aaron Sidford %A Ta-Wei Tu %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-graur26a %I PMLR %P 2878--2900 %U https://proceedings.mlr.press/v336/graur26a.html %V 336 %X We provide faster running times for exactly solving discounted Markov Decision Processes (DMDPs) in strongly polynomial time. We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in approximately solving DMDPs, yield our results.
APA
Graur, A., Sidford, A. & Tu, T.. (2026). Randomization for Faster Exact Optimization of Discounted Markov Decision Processes. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:2878-2900 Available from https://proceedings.mlr.press/v336/graur26a.html.

Related Material