A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Kihyuk Hong, Ambuj Tewari
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:18711-18737, 2024.

Abstract

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-hong24e, title = {A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear {MDP}s}, author = {Hong, Kihyuk and Tewari, Ambuj}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {18711--18737}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hong24e/hong24e.pdf}, url = {https://proceedings.mlr.press/v235/hong24e.html}, abstract = {We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.} }
Endnote
%0 Conference Paper %T A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs %A Kihyuk Hong %A Ambuj Tewari %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-hong24e %I PMLR %P 18711--18737 %U https://proceedings.mlr.press/v235/hong24e.html %V 235 %X We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.
APA
Hong, K. & Tewari, A.. (2024). A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:18711-18737 Available from https://proceedings.mlr.press/v235/hong24e.html.

Related Material