Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Han Zhong; Jiachen Hu; Yecheng Xue; Tongyang Li; Liwei Wang

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:61681-61707, 2024.

Abstract

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with

$S$ states,

$A$ actions, and horizon

$H$ , and establish an

$\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where

$T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with

$d$ -dimensional linear representation and prove that it enjoys

$\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the

$\Omega(\sqrt{T})$ -regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-zhong24b,
  title = 	 {Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret},
  author =       {Zhong, Han and Hu, Jiachen and Xue, Yecheng and Li, Tongyang and Wang, Liwei},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {61681--61707},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhong24b/zhong24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/zhong24b.html},
  abstract = 	 {While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $\Omega(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.}
}

Endnote

%0 Conference Paper
%T Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
%A Han Zhong
%A Jiachen Hu
%A Yecheng Xue
%A Tongyang Li
%A Liwei Wang
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-zhong24b
%I PMLR
%P 61681--61707
%U https://proceedings.mlr.press/v235/zhong24b.html
%V 235
%X While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $\Omega(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.

APA


Zhong, H., Hu, J., Xue, Y., Li, T. & Wang, L.. (2024). Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:61681-61707 Available from https://proceedings.mlr.press/v235/zhong24b.html.

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

Abstract

Cite this Paper

Related Material