Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

Andrea Zanette
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12287-12297, 2021.

Abstract

Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive exponential information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) realizability holds, 2) the batch algorithm observes the exact reward and transition functions, and 3) the batch algorithm is given the best a priori data distribution for the problem class. Our work introduces a new ‘oracle + batch algorithm’ framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-zanette21a, title = {Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL}, author = {Zanette, Andrea}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12287--12297}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/zanette21a/zanette21a.pdf}, url = {https://proceedings.mlr.press/v139/zanette21a.html}, abstract = {Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive exponential information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) realizability holds, 2) the batch algorithm observes the exact reward and transition functions, and 3) the batch algorithm is given the best a priori data distribution for the problem class. Our work introduces a new ‘oracle + batch algorithm’ framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.} }
Endnote
%0 Conference Paper %T Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL %A Andrea Zanette %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-zanette21a %I PMLR %P 12287--12297 %U https://proceedings.mlr.press/v139/zanette21a.html %V 139 %X Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive exponential information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) realizability holds, 2) the batch algorithm observes the exact reward and transition functions, and 3) the batch algorithm is given the best a priori data distribution for the problem class. Our work introduces a new ‘oracle + batch algorithm’ framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.
APA
Zanette, A.. (2021). Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12287-12297 Available from https://proceedings.mlr.press/v139/zanette21a.html.

Related Material