A Constrained Multi-Objective Reinforcement Learning Framework

Sandy Huang; Abbas Abdolmaleki; Giulia Vezzani; Philemon Brakel; Daniel J. Mankowitz; Michael Neunert; Steven Bohez; Yuval Tassa; Nicolas Heess; Martin Riedmiller; Raia Hadsell

A Constrained Multi-Objective Reinforcement Learning Framework

Sandy Huang, Abbas Abdolmaleki, Giulia Vezzani, Philemon Brakel, Daniel J. Mankowitz, Michael Neunert, Steven Bohez, Yuval Tassa, Nicolas Heess, Martin Riedmiller, Raia Hadsell

Proceedings of the 5th Conference on Robot Learning, PMLR 164:883-893, 2022.

Abstract

Many real-world problems, especially in robotics, require that reinforcement learning (RL) agents learn policies that not only maximize an environment reward, but also satisfy constraints. We propose a high-level framework for solving such problems, that treats the environment reward and costs as separate objectives, and learns what preference over objectives the policy should optimize for in order to meet the constraints. We call this Learning Preferences and Policies in Parallel (LP3). By making different choices for how to learn the preference and how to optimize for the policy given the preference, we can obtain existing approaches (e.g., Lagrangian relaxation) and derive novel approaches that lead to better performance. One of these is an algorithm that learns a set of constraint-satisfying policies, useful for when we do not know the exact constraint a priori.

Cite this Paper

BibTeX


@InProceedings{pmlr-v164-huang22a,
  title = 	 {A Constrained Multi-Objective Reinforcement Learning Framework},
  author =       {Huang, Sandy and Abdolmaleki, Abbas and Vezzani, Giulia and Brakel, Philemon and Mankowitz, Daniel J. and Neunert, Michael and Bohez, Steven and Tassa, Yuval and Heess, Nicolas and Riedmiller, Martin and Hadsell, Raia},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {883--893},
  year = 	 {2022},
  editor = 	 {Faust, Aleksandra and Hsu, David and Neumann, Gerhard},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {08--11 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v164/huang22a/huang22a.pdf},
  url = 	 {https://proceedings.mlr.press/v164/huang22a.html},
  abstract = 	 {Many real-world problems, especially in robotics, require that reinforcement learning (RL) agents learn policies that not only maximize an environment reward, but also satisfy constraints. We propose a high-level framework for solving such problems, that treats the environment reward and costs as separate objectives, and learns what preference over objectives the policy should optimize for in order to meet the constraints. We call this Learning Preferences and Policies in Parallel (LP3). By making different choices for how to learn the preference and how to optimize for the policy given the preference, we can obtain existing approaches (e.g., Lagrangian relaxation) and derive novel approaches that lead to better performance. One of these is an algorithm that learns a set of constraint-satisfying policies, useful for when we do not know the exact constraint a priori.}
}

Endnote

%0 Conference Paper
%T A Constrained Multi-Objective Reinforcement Learning Framework
%A Sandy Huang
%A Abbas Abdolmaleki
%A Giulia Vezzani
%A Philemon Brakel
%A Daniel J. Mankowitz
%A Michael Neunert
%A Steven Bohez
%A Yuval Tassa
%A Nicolas Heess
%A Martin Riedmiller
%A Raia Hadsell
%B Proceedings of the 5th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Aleksandra Faust
%E David Hsu
%E Gerhard Neumann	
%F pmlr-v164-huang22a
%I PMLR
%P 883--893
%U https://proceedings.mlr.press/v164/huang22a.html
%V 164
%X Many real-world problems, especially in robotics, require that reinforcement learning (RL) agents learn policies that not only maximize an environment reward, but also satisfy constraints. We propose a high-level framework for solving such problems, that treats the environment reward and costs as separate objectives, and learns what preference over objectives the policy should optimize for in order to meet the constraints. We call this Learning Preferences and Policies in Parallel (LP3). By making different choices for how to learn the preference and how to optimize for the policy given the preference, we can obtain existing approaches (e.g., Lagrangian relaxation) and derive novel approaches that lead to better performance. One of these is an algorithm that learns a set of constraint-satisfying policies, useful for when we do not know the exact constraint a priori.

APA


Huang, S., Abdolmaleki, A., Vezzani, G., Brakel, P., Mankowitz, D.J., Neunert, M., Bohez, S., Tassa, Y., Heess, N., Riedmiller, M. & Hadsell, R.. (2022). A Constrained Multi-Objective Reinforcement Learning Framework. Proceedings of the 5th Conference on Robot Learning, in Proceedings of Machine Learning Research 164:883-893 Available from https://proceedings.mlr.press/v164/huang22a.html.

A Constrained Multi-Objective Reinforcement Learning Framework

Abstract

Cite this Paper

Related Material