Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters

Ahmed Agha; Baris Kayalibay; Atanas Mirchev; Patrick van der Smagt; Justin Bayer

Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters

Ahmed Agha, Baris Kayalibay, Atanas Mirchev, Patrick van der Smagt, Justin Bayer

Proceedings of The 8th Conference on Robot Learning, PMLR 270:1216-1230, 2025.

Abstract

Applying reinforcement learning (RL) to learn effective policies on physical robots without supervision remains challenging when it comes to tasks where safe exploration is critical. Constrained model-based RL (CMBRL) presents a promising approach to this problem. These methods are designed to learn constraint-adhering policies through constrained optimization approaches. Yet, such policies often fail to meet stringent safety requirements during learning and exploration. Our solution “CASE” aims to reduce the instances where constraints are breached during the learning phase. Specifically, CASE integrates techniques for optimizing constrained policies and employs planning-based safety filters as backup policies, effectively lowering constraint violations during learning and making it a more reliable option than other recent constrained model-based policy optimization methods.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-agha25a,
  title = 	 {Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters},
  author =       {Agha, Ahmed and Kayalibay, Baris and Mirchev, Atanas and Smagt, Patrick van der and Bayer, Justin},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {1216--1230},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/agha25a/agha25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/agha25a.html},
  abstract = 	 {Applying reinforcement learning (RL) to learn effective policies on physical robots without supervision remains challenging when it comes to tasks where safe exploration is critical. Constrained model-based RL (CMBRL) presents a promising approach to this problem. These methods are designed to learn constraint-adhering policies through constrained optimization approaches. Yet, such policies often fail to meet stringent safety requirements during learning and exploration. Our solution “CASE” aims to reduce the instances where constraints are breached during the learning phase. Specifically, CASE integrates techniques for optimizing constrained policies and employs planning-based safety filters as backup policies, effectively lowering constraint violations during learning and making it a more reliable option than other recent constrained model-based policy optimization methods.}
}

Endnote

%0 Conference Paper
%T Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters
%A Ahmed Agha
%A Baris Kayalibay
%A Atanas Mirchev
%A Patrick van der Smagt
%A Justin Bayer
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-agha25a
%I PMLR
%P 1216--1230
%U https://proceedings.mlr.press/v270/agha25a.html
%V 270
%X Applying reinforcement learning (RL) to learn effective policies on physical robots without supervision remains challenging when it comes to tasks where safe exploration is critical. Constrained model-based RL (CMBRL) presents a promising approach to this problem. These methods are designed to learn constraint-adhering policies through constrained optimization approaches. Yet, such policies often fail to meet stringent safety requirements during learning and exploration. Our solution “CASE” aims to reduce the instances where constraints are breached during the learning phase. Specifically, CASE integrates techniques for optimizing constrained policies and employs planning-based safety filters as backup policies, effectively lowering constraint violations during learning and making it a more reliable option than other recent constrained model-based policy optimization methods.

APA

Agha, A., Kayalibay, B., Mirchev, A., Smagt, P.v.d. & Bayer, J.. (2025). Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:1216-1230 Available from https://proceedings.mlr.press/v270/agha25a.html.

Exploring Under Constraints with Model-Based Actor-Critic and Safety Filters

Abstract

Cite this Paper

Related Material