A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Kihyuk Hong; Ambuj Tewari

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Kihyuk Hong, Ambuj Tewari

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:18711-18737, 2024.

Abstract

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an

$\epsilon$ -optimal policy with

$\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of

$\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires

$\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-hong24e,
  title = 	 {A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear {MDP}s},
  author =       {Hong, Kihyuk and Tewari, Ambuj},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {18711--18737},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hong24e/hong24e.pdf},
  url = 	 {https://proceedings.mlr.press/v235/hong24e.html},
  abstract = 	 {We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.}
}

Endnote

%0 Conference Paper
%T A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs
%A Kihyuk Hong
%A Ambuj Tewari
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-hong24e
%I PMLR
%P 18711--18737
%U https://proceedings.mlr.press/v235/hong24e.html
%V 235
%X We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $\epsilon$-optimal policy with $\mathcal{O}(\epsilon^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $\mathcal{O}(\epsilon^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $\mathcal{O}(\epsilon^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

APA


Hong, K. & Tewari, A.. (2024). A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:18711-18737 Available from https://proceedings.mlr.press/v235/hong24e.html.

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

Abstract

Cite this Paper

Related Material