Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7412-7422, 2021.

Abstract

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-malik21c, title = {Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity}, author = {Malik, Dhruv and Pacchiano, Aldo and Srinivasan, Vishwak and Li, Yuanzhi}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7412--7422}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/malik21c/malik21c.pdf}, url = {https://proceedings.mlr.press/v139/malik21c.html}, abstract = {Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.} }
Endnote
%0 Conference Paper %T Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity %A Dhruv Malik %A Aldo Pacchiano %A Vishwak Srinivasan %A Yuanzhi Li %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-malik21c %I PMLR %P 7412--7422 %U https://proceedings.mlr.press/v139/malik21c.html %V 139 %X Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
APA
Malik, D., Pacchiano, A., Srinivasan, V. & Li, Y.. (2021). Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7412-7422 Available from https://proceedings.mlr.press/v139/malik21c.html.

Related Material