Quantum algorithms for reinforcement learning with a generative model

Daochen Wang; Aarthi Sundaram; Robin Kothari; Ashish Kapoor; Martin Roetteler

Quantum algorithms for reinforcement learning with a generative model

Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, Martin Roetteler

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:10916-10926, 2021.

Abstract

Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a

$\gamma$ -discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy (

$\pi^*$ ), the optimal value function (

$v^*$ ), and the optimal

$Q$ -function (

$q^*$ ), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy (

$\epsilon$ ) and two main parameters of the MDP: the effective time horizon (

$\frac{1}{1-\gamma}$ ) and the size of the action space (

$A$ ). Moreover, we show that our quantum algorithm for computing

$q^*$ is optimal by proving a matching quantum lower bound.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-wang21w,
  title = 	 {Quantum algorithms for reinforcement learning with a generative model},
  author =       {Wang, Daochen and Sundaram, Aarthi and Kothari, Robin and Kapoor, Ashish and Roetteler, Martin},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {10916--10926},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/wang21w/wang21w.pdf},
  url = 	 {https://proceedings.mlr.press/v139/wang21w.html},
  abstract = 	 {Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a $\gamma$-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy ($\pi^*$), the optimal value function ($v^*$), and the optimal $Q$-function ($q^*$), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy ($\epsilon$) and two main parameters of the MDP: the effective time horizon ($\frac{1}{1-\gamma}$) and the size of the action space ($A$). Moreover, we show that our quantum algorithm for computing $q^*$ is optimal by proving a matching quantum lower bound.}
}

Endnote

%0 Conference Paper
%T Quantum algorithms for reinforcement learning with a generative model
%A Daochen Wang
%A Aarthi Sundaram
%A Robin Kothari
%A Ashish Kapoor
%A Martin Roetteler
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-wang21w
%I PMLR
%P 10916--10926
%U https://proceedings.mlr.press/v139/wang21w.html
%V 139
%X Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a $\gamma$-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy ($\pi^*$), the optimal value function ($v^*$), and the optimal $Q$-function ($q^*$), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy ($\epsilon$) and two main parameters of the MDP: the effective time horizon ($\frac{1}{1-\gamma}$) and the size of the action space ($A$). Moreover, we show that our quantum algorithm for computing $q^*$ is optimal by proving a matching quantum lower bound.

APA


Wang, D., Sundaram, A., Kothari, R., Kapoor, A. & Roetteler, M.. (2021). Quantum algorithms for reinforcement learning with a generative model. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:10916-10926 Available from https://proceedings.mlr.press/v139/wang21w.html.

Quantum algorithms for reinforcement learning with a generative model

Abstract

Cite this Paper

Related Material