A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Kihyuk Hong, Yuhang Li, Ambuj Tewari
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:280-288, 2024.

Abstract

Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-hong24a, title = {A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning}, author = {Hong, Kihyuk and Li, Yuhang and Tewari, Ambuj}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {280--288}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/hong24a/hong24a.pdf}, url = {https://proceedings.mlr.press/v238/hong24a.html}, abstract = {Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.} }
Endnote
%0 Conference Paper %T A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning %A Kihyuk Hong %A Yuhang Li %A Ambuj Tewari %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-hong24a %I PMLR %P 280--288 %U https://proceedings.mlr.press/v238/hong24a.html %V 238 %X Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA finds a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for sample-efficient learning.
APA
Hong, K., Li, Y. & Tewari, A.. (2024). A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:280-288 Available from https://proceedings.mlr.press/v238/hong24a.html.

Related Material