When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Andrea Zanette
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40637-40668, 2023.

Abstract

Understanding when reinforcement learning algorithms can make successful off-policy predictions—and when the may fail to do so–remains an open problem. Typically, model-free algorithms for reinforcement learning are analyzed under a condition called Bellman completeness when they operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-zanette23a, title = {When is Realizability Sufficient for Off-Policy Reinforcement Learning?}, author = {Zanette, Andrea}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {40637--40668}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/zanette23a/zanette23a.pdf}, url = {https://proceedings.mlr.press/v202/zanette23a.html}, abstract = {Understanding when reinforcement learning algorithms can make successful off-policy predictions—and when the may fail to do so–remains an open problem. Typically, model-free algorithms for reinforcement learning are analyzed under a condition called Bellman completeness when they operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.} }
Endnote
%0 Conference Paper %T When is Realizability Sufficient for Off-Policy Reinforcement Learning? %A Andrea Zanette %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-zanette23a %I PMLR %P 40637--40668 %U https://proceedings.mlr.press/v202/zanette23a.html %V 202 %X Understanding when reinforcement learning algorithms can make successful off-policy predictions—and when the may fail to do so–remains an open problem. Typically, model-free algorithms for reinforcement learning are analyzed under a condition called Bellman completeness when they operate off-policy with function approximation, unless additional conditions are met. However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. We establish finite-sample guarantees for off-policy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.
APA
Zanette, A.. (2023). When is Realizability Sufficient for Off-Policy Reinforcement Learning?. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:40637-40668 Available from https://proceedings.mlr.press/v202/zanette23a.html.

Related Material