Quantum Algorithms for Finite-horizon Markov Decision Processes

Bin Luo; Yuwen Huang; Jonathan Allcock; Xiaojun Lin; Shengyu Zhang; John C.S. Lui

Quantum Algorithms for Finite-horizon Markov Decision Processes

Bin Luo, Yuwen Huang, Jonathan Allcock, Xiaojun Lin, Shengyu Zhang, John C.S. Lui

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:41200-41234, 2025.

Abstract

In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment’s dynamics (i.e., transition probabilities), we prove that our Quantum Value Iteration (QVI) algorithm QVI-1 achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{\ast}$) and the optimal V-value function ($V_{0}^{\ast}$). Furthermore, our algorithm QVI-2 provides an additional speedup in the size of the state space $(S)$ when obtaining near-optimal policies and V-value functions. Both QVI-1 and QVI-2 achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on $S$ and $A$. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms QVI-3 and QVI-4 achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of $A$, estimation error $(\epsilon)$, and time horizon $(H)$. More importantly, we prove quantum lower bounds to show that QVI-3 and QVI-4 are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-luo25e,
  title = 	 {Quantum Algorithms for Finite-horizon {M}arkov Decision Processes},
  author =       {Luo, Bin and Huang, Yuwen and Allcock, Jonathan and Lin, Xiaojun and Zhang, Shengyu and Lui, John C.S.},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {41200--41234},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/luo25e/luo25e.pdf},
  url = 	 {https://proceedings.mlr.press/v267/luo25e.html},
  abstract = 	 {In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment’s dynamics (i.e., transition probabilities), we prove that our Quantum Value Iteration (QVI) algorithm QVI-1 achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{\ast}$) and the optimal V-value function ($V_{0}^{\ast}$). Furthermore, our algorithm QVI-2 provides an additional speedup in the size of the state space $(S)$ when obtaining near-optimal policies and V-value functions. Both QVI-1 and QVI-2 achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on $S$ and $A$. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms QVI-3 and QVI-4 achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of $A$, estimation error $(\epsilon)$, and time horizon $(H)$. More importantly, we prove quantum lower bounds to show that QVI-3 and QVI-4 are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.}
}

Endnote

%0 Conference Paper
%T Quantum Algorithms for Finite-horizon Markov Decision Processes
%A Bin Luo
%A Yuwen Huang
%A Jonathan Allcock
%A Xiaojun Lin
%A Shengyu Zhang
%A John C.S. Lui
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-luo25e
%I PMLR
%P 41200--41234
%U https://proceedings.mlr.press/v267/luo25e.html
%V 267
%X In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment’s dynamics (i.e., transition probabilities), we prove that our Quantum Value Iteration (QVI) algorithm QVI-1 achieves a quadratic speedup in the size of the action space $(A)$ compared with the classical value iteration algorithm for computing the optimal policy ($\pi^{\ast}$) and the optimal V-value function ($V_{0}^{\ast}$). Furthermore, our algorithm QVI-2 provides an additional speedup in the size of the state space $(S)$ when obtaining near-optimal policies and V-value functions. Both QVI-1 and QVI-2 achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on $S$ and $A$. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms QVI-3 and QVI-4 achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of $A$, estimation error $(\epsilon)$, and time horizon $(H)$. More importantly, we prove quantum lower bounds to show that QVI-3 and QVI-4 are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon.

APA

Luo, B., Huang, Y., Allcock, J., Lin, X., Zhang, S. & Lui, J.C.. (2025). Quantum Algorithms for Finite-horizon Markov Decision Processes. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:41200-41234 Available from https://proceedings.mlr.press/v267/luo25e.html.

Quantum Algorithms for Finite-horizon Markov Decision Processes

Abstract

Cite this Paper

Related Material