A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Kihyuk Hong; Yuhang Li; Ambuj Tewari

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Kihyuk Hong, Yuhang Li, Ambuj Tewari

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:280-288, 2024.

Abstract

Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-hong24a,
  title = 	 {A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning},
  author =       {Hong, Kihyuk and Li, Yuhang and Tewari, Ambuj},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {280--288},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/hong24a/hong24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/hong24a.html},
  abstract = 	 {Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.}
}

Endnote

%0 Conference Paper
%T A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
%A Kihyuk Hong
%A Yuhang Li
%A Ambuj Tewari
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-hong24a
%I PMLR
%P 280--288
%U https://proceedings.mlr.press/v238/hong24a.html
%V 238
%X Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.

APA

Hong, K., Li, Y. & Tewari, A.. (2024). A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:280-288 Available from https://proceedings.mlr.press/v238/hong24a.html.

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Abstract

Cite this Paper

Related Material