Constrained Reinforcement Learning Under Model Mismatch

Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47017-47032, 2024.

Abstract

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address this challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a policy that optimizes the reward and at the same time satisfies the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We show the effectiveness of our algorithm on a set of RL tasks with constraints.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-sun24d, title = {Constrained Reinforcement Learning Under Model Mismatch}, author = {Sun, Zhongchang and He, Sihong and Miao, Fei and Zou, Shaofeng}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {47017--47032}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/sun24d/sun24d.pdf}, url = {https://proceedings.mlr.press/v235/sun24d.html}, abstract = {Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address this challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a policy that optimizes the reward and at the same time satisfies the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We show the effectiveness of our algorithm on a set of RL tasks with constraints.} }
Endnote
%0 Conference Paper %T Constrained Reinforcement Learning Under Model Mismatch %A Zhongchang Sun %A Sihong He %A Fei Miao %A Shaofeng Zou %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-sun24d %I PMLR %P 47017--47032 %U https://proceedings.mlr.press/v235/sun24d.html %V 235 %X Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address this challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a policy that optimizes the reward and at the same time satisfies the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We show the effectiveness of our algorithm on a set of RL tasks with constraints.
APA
Sun, Z., He, S., Miao, F. & Zou, S.. (2024). Constrained Reinforcement Learning Under Model Mismatch. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:47017-47032 Available from https://proceedings.mlr.press/v235/sun24d.html.

Related Material