On the Expressivity of Neural Networks for Deep Reinforcement Learning

Kefan Dong; Yuping Luo; Tianhe Yu; Chelsea Finn; Tengyu Ma

On the Expressivity of Neural Networks for Deep Reinforcement Learning

Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2627-2637, 2020.

Abstract

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-dong20d,
  title = 	 {On the Expressivity of Neural Networks for Deep Reinforcement Learning},
  author =       {Dong, Kefan and Luo, Yuping and Yu, Tianhe and Finn, Chelsea and Ma, Tengyu},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {2627--2637},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/dong20d/dong20d.pdf},
  url = 	 {https://proceedings.mlr.press/v119/dong20d.html},
  abstract = 	 {We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.}
}

Endnote

%0 Conference Paper
%T On the Expressivity of Neural Networks for Deep Reinforcement Learning
%A Kefan Dong
%A Yuping Luo
%A Tianhe Yu
%A Chelsea Finn
%A Tengyu Ma
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-dong20d
%I PMLR
%P 2627--2637
%U https://proceedings.mlr.press/v119/dong20d.html
%V 119
%X We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.

APA

Dong, K., Luo, Y., Yu, T., Finn, C. & Ma, T.. (2020). On the Expressivity of Neural Networks for Deep Reinforcement Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2627-2637 Available from https://proceedings.mlr.press/v119/dong20d.html.

On the Expressivity of Neural Networks for Deep Reinforcement Learning

Abstract

Cite this Paper

Related Material