Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

Asuman E. Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:26769-26791, 2023.

Abstract

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-ozdaglar23a, title = {Revisiting the Linear-Programming Framework for Offline {RL} with General Function Approximation}, author = {Ozdaglar, Asuman E. and Pattathil, Sarath and Zhang, Jiawei and Zhang, Kaiqing}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {26769--26791}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/ozdaglar23a/ozdaglar23a.pdf}, url = {https://proceedings.mlr.press/v202/ozdaglar23a.html}, abstract = {Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.} }
Endnote
%0 Conference Paper %T Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation %A Asuman E. Ozdaglar %A Sarath Pattathil %A Jiawei Zhang %A Kaiqing Zhang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-ozdaglar23a %I PMLR %P 26769--26791 %U https://proceedings.mlr.press/v202/ozdaglar23a.html %V 202 %X Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators, especially to handle the case with excessively large state-action spaces. Among them, the framework based on the linear-programming (LP) reformulation of Markov decision processes has shown promise: it enables sample-efficient offline RL with function approximation, under only partial data coverage and realizability assumptions on the function classes, with favorable computational tractability. In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. Our key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. We hope our insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.
APA
Ozdaglar, A.E., Pattathil, S., Zhang, J. & Zhang, K.. (2023). Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:26769-26791 Available from https://proceedings.mlr.press/v202/ozdaglar23a.html.

Related Material