[edit]
Robust Multi-Objective Reinforcement Learning with Dynamic Preferences
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:96-111, 2023.
Abstract
This paper considers multi-objective reinforcement
learning (MORL) when preferences over the multiple
tasks are not perfectly known. Indeed, it is often
the case in practice that an agent is trying to
achieve tasks that may have competing goals but does
not exactly know how to trade them off. The goal of
MORL is thus to learn optimal policies under a set
of possible preferences leading to different
trade-offs on the Pareto frontier. Here, we propose
a new method by considering the dynamics of
preferences over tasks. While this is a more
realistic setup in many scenarios, more importantly,
it helps us devise a simple and straightforward
approach by considering a surrogate state space made
up of both states and preferences, which leads to a
joint exploration of states and preferences. Static
(and possibly unknown) preferences can also be
understood as a limiting case of our framework. In
sum, this allows us to devise both deep Q-learning
and actor-critic methods based on planning under a
preference-dependent policy and learning the
multi-dimensional value function under said
policy. Finally, the performance and effectiveness
of our method are demonstrated in experiments run on
different domains.