Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

Bhargav Ganguly, Yang Xu, Vaneet Aggarwal
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:18257-18276, 2025.

Abstract

This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent’s engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$[$\tilde{\mathcal{O}}(\cdot)$ conceals logarithmic terms of $T$.], a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts, where $T$ is the length of the time horizon.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-ganguly25a, title = {Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward {M}arkov Decision Processes}, author = {Ganguly, Bhargav and Xu, Yang and Aggarwal, Vaneet}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {18257--18276}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/ganguly25a/ganguly25a.pdf}, url = {https://proceedings.mlr.press/v267/ganguly25a.html}, abstract = {This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent’s engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$[$\tilde{\mathcal{O}}(\cdot)$ conceals logarithmic terms of $T$.], a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts, where $T$ is the length of the time horizon.} }
Endnote
%0 Conference Paper %T Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes %A Bhargav Ganguly %A Yang Xu %A Vaneet Aggarwal %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-ganguly25a %I PMLR %P 18257--18276 %U https://proceedings.mlr.press/v267/ganguly25a.html %V 267 %X This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent’s engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$[$\tilde{\mathcal{O}}(\cdot)$ conceals logarithmic terms of $T$.], a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts, where $T$ is the length of the time horizon.
APA
Ganguly, B., Xu, Y. & Aggarwal, V.. (2025). Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:18257-18276 Available from https://proceedings.mlr.press/v267/ganguly25a.html.

Related Material