Robust Multi-Objective Reinforcement Learning with Dynamic Preferences

Francois Buet-Golfouse, Parth Pahwa
Proceedings of The 14th Asian Conference on Machine Learning, PMLR 189:96-111, 2023.

Abstract

This paper considers multi-objective reinforcement learning (MORL) when preferences over the multiple tasks are not perfectly known. Indeed, it is often the case in practice that an agent is trying to achieve tasks that may have competing goals but does not exactly know how to trade them off. The goal of MORL is thus to learn optimal policies under a set of possible preferences leading to different trade-offs on the Pareto frontier. Here, we propose a new method by considering the dynamics of preferences over tasks. While this is a more realistic setup in many scenarios, more importantly, it helps us devise a simple and straightforward approach by considering a surrogate state space made up of both states and preferences, which leads to a joint exploration of states and preferences. Static (and possibly unknown) preferences can also be understood as a limiting case of our framework. In sum, this allows us to devise both deep Q-learning and actor-critic methods based on planning under a preference-dependent policy and learning the multi-dimensional value function under said policy. Finally, the performance and effectiveness of our method are demonstrated in experiments run on different domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v189-buet-golfouse23a, title = {Robust Multi-Objective Reinforcement Learning with Dynamic Preferences}, author = {Buet-Golfouse, Francois and Pahwa, Parth}, booktitle = {Proceedings of The 14th Asian Conference on Machine Learning}, pages = {96--111}, year = {2023}, editor = {Khan, Emtiyaz and Gonen, Mehmet}, volume = {189}, series = {Proceedings of Machine Learning Research}, month = {12--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v189/buet-golfouse23a/buet-golfouse23a.pdf}, url = {https://proceedings.mlr.press/v189/buet-golfouse23a.html}, abstract = {This paper considers multi-objective reinforcement learning (MORL) when preferences over the multiple tasks are not perfectly known. Indeed, it is often the case in practice that an agent is trying to achieve tasks that may have competing goals but does not exactly know how to trade them off. The goal of MORL is thus to learn optimal policies under a set of possible preferences leading to different trade-offs on the Pareto frontier. Here, we propose a new method by considering the dynamics of preferences over tasks. While this is a more realistic setup in many scenarios, more importantly, it helps us devise a simple and straightforward approach by considering a surrogate state space made up of both states and preferences, which leads to a joint exploration of states and preferences. Static (and possibly unknown) preferences can also be understood as a limiting case of our framework. In sum, this allows us to devise both deep Q-learning and actor-critic methods based on planning under a preference-dependent policy and learning the multi-dimensional value function under said policy. Finally, the performance and effectiveness of our method are demonstrated in experiments run on different domains.} }
Endnote
%0 Conference Paper %T Robust Multi-Objective Reinforcement Learning with Dynamic Preferences %A Francois Buet-Golfouse %A Parth Pahwa %B Proceedings of The 14th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Emtiyaz Khan %E Mehmet Gonen %F pmlr-v189-buet-golfouse23a %I PMLR %P 96--111 %U https://proceedings.mlr.press/v189/buet-golfouse23a.html %V 189 %X This paper considers multi-objective reinforcement learning (MORL) when preferences over the multiple tasks are not perfectly known. Indeed, it is often the case in practice that an agent is trying to achieve tasks that may have competing goals but does not exactly know how to trade them off. The goal of MORL is thus to learn optimal policies under a set of possible preferences leading to different trade-offs on the Pareto frontier. Here, we propose a new method by considering the dynamics of preferences over tasks. While this is a more realistic setup in many scenarios, more importantly, it helps us devise a simple and straightforward approach by considering a surrogate state space made up of both states and preferences, which leads to a joint exploration of states and preferences. Static (and possibly unknown) preferences can also be understood as a limiting case of our framework. In sum, this allows us to devise both deep Q-learning and actor-critic methods based on planning under a preference-dependent policy and learning the multi-dimensional value function under said policy. Finally, the performance and effectiveness of our method are demonstrated in experiments run on different domains.
APA
Buet-Golfouse, F. & Pahwa, P.. (2023). Robust Multi-Objective Reinforcement Learning with Dynamic Preferences. Proceedings of The 14th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 189:96-111 Available from https://proceedings.mlr.press/v189/buet-golfouse23a.html.

Related Material