Robust Multi-Objective Reinforcement Learning with
 Dynamic Preferences

Francois Buet-Golfouse; Parth Pahwa

Robust Multi-Objective Reinforcement Learning with Dynamic Preferences

Francois Buet-Golfouse, Parth Pahwa

Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:96-111, 2023.

Abstract

This paper considers multi-objective reinforcement learning (MORL) when preferences over the multiple tasks are not perfectly known. Indeed, it is often the case in practice that an agent is trying to achieve tasks that may have competing goals but does not exactly know how to trade them off. The goal of MORL is thus to learn optimal policies under a set of possible preferences leading to different trade-offs on the Pareto frontier. Here, we propose a new method by considering the dynamics of preferences over tasks. While this is a more realistic setup in many scenarios, more importantly, it helps us devise a simple and straightforward approach by considering a surrogate state space made up of both states and preferences, which leads to a joint exploration of states and preferences. Static (and possibly unknown) preferences can also be understood as a limiting case of our framework. In sum, this allows us to devise both deep Q-learning and actor-critic methods based on planning under a preference-dependent policy and learning the multi-dimensional value function under said policy. Finally, the performance and effectiveness of our method are demonstrated in experiments run on different domains.

Cite this Paper

BibTeX


@InProceedings{pmlr-v189-buet-golfouse23a,
  title = 	 {Robust Multi-Objective Reinforcement Learning with
 Dynamic Preferences},
  author =       {Buet-Golfouse, Francois and Pahwa, Parth},
  booktitle = 	 {Proceedings of The 14th Asian Conference on Machine
 Learning},
  pages = 	 {96--111},
  year = 	 {2023},
  editor = 	 {Khan, Emtiyaz and Gonen, Mehmet},
  volume = 	 {189},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--14 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v189/buet-golfouse23a/buet-golfouse23a.pdf},
  url = 	 {https://proceedings.mlr.press/v189/buet-golfouse23a.html},
  abstract = 	 {This paper considers multi-objective reinforcement
 learning (MORL) when preferences over the multiple
 tasks are not perfectly known. Indeed, it is often
 the case in practice that an agent is trying to
 achieve tasks that may have competing goals but does
 not exactly know how to trade them off. The goal of
 MORL is thus to learn optimal policies under a set
 of possible preferences leading to different
 trade-offs on the Pareto frontier. Here, we propose
 a new method by considering the dynamics of
 preferences over tasks. While this is a more
 realistic setup in many scenarios, more importantly,
 it helps us devise a simple and straightforward
 approach by considering a surrogate state space made
 up of both states and preferences, which leads to a
 joint exploration of states and preferences. Static
 (and possibly unknown) preferences can also be
 understood as a limiting case of our framework. In
 sum, this allows us to devise both deep Q-learning
 and actor-critic methods based on planning under a
 preference-dependent policy and learning the
 multi-dimensional value function under said
 policy. Finally, the performance and effectiveness
 of our method are demonstrated in experiments run on
 different domains.}
}

Endnote

%0 Conference Paper
%T Robust Multi-Objective Reinforcement Learning with
 Dynamic Preferences
%A Francois Buet-Golfouse
%A Parth Pahwa
%B Proceedings of The 14th Asian Conference on Machine
 Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Emtiyaz Khan
%E Mehmet Gonen	
%F pmlr-v189-buet-golfouse23a
%I PMLR
%P 96--111
%U https://proceedings.mlr.press/v189/buet-golfouse23a.html
%V 189
%X This paper considers multi-objective reinforcement
 learning (MORL) when preferences over the multiple
 tasks are not perfectly known. Indeed, it is often
 the case in practice that an agent is trying to
 achieve tasks that may have competing goals but does
 not exactly know how to trade them off. The goal of
 MORL is thus to learn optimal policies under a set
 of possible preferences leading to different
 trade-offs on the Pareto frontier. Here, we propose
 a new method by considering the dynamics of
 preferences over tasks. While this is a more
 realistic setup in many scenarios, more importantly,
 it helps us devise a simple and straightforward
 approach by considering a surrogate state space made
 up of both states and preferences, which leads to a
 joint exploration of states and preferences. Static
 (and possibly unknown) preferences can also be
 understood as a limiting case of our framework. In
 sum, this allows us to devise both deep Q-learning
 and actor-critic methods based on planning under a
 preference-dependent policy and learning the
 multi-dimensional value function under said
 policy. Finally, the performance and effectiveness
 of our method are demonstrated in experiments run on
 different domains.

APA


Buet-Golfouse, F. & Pahwa, P.. (2023). Robust Multi-Objective Reinforcement Learning with
 Dynamic Preferences. Proceedings of The 14th Asian Conference on Machine
 Learning, in Proceedings of Machine Learning Research 189:96-111 Available from https://proceedings.mlr.press/v189/buet-golfouse23a.html.

Related Material

Download PDF