On the Expressivity of Neural Networks for Deep Reinforcement Learning

Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2627-2637, 2020.

Abstract

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-dong20d, title = {On the Expressivity of Neural Networks for Deep Reinforcement Learning}, author = {Dong, Kefan and Luo, Yuping and Yu, Tianhe and Finn, Chelsea and Ma, Tengyu}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2627--2637}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/dong20d/dong20d.pdf}, url = {https://proceedings.mlr.press/v119/dong20d.html}, abstract = {We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.} }
Endnote
%0 Conference Paper %T On the Expressivity of Neural Networks for Deep Reinforcement Learning %A Kefan Dong %A Yuping Luo %A Tianhe Yu %A Chelsea Finn %A Tengyu Ma %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-dong20d %I PMLR %P 2627--2637 %U https://proceedings.mlr.press/v119/dong20d.html %V 119 %X We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.
APA
Dong, K., Luo, Y., Yu, T., Finn, C. & Ma, T.. (2020). On the Expressivity of Neural Networks for Deep Reinforcement Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2627-2637 Available from https://proceedings.mlr.press/v119/dong20d.html.

Related Material