Multi-objective Linear Reinforcement Learning with Lexicographic Rewards

Bo Xue, Dake Bu, Ji Cheng, Yuanyu Wan, Qingfu Zhang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:70065-70085, 2025.

Abstract

Reinforcement Learning (RL) with linear transition kernels and reward functions has recently attracted growing attention due to its computational efficiency and theoretical advancements. However, prior theoretical research in RL has primarily focused on single-objective problems, resulting in limited theoretical development for multi-objective reinforcement learning (MORL). To bridge this gap, we examine MORL under lexicographic reward structures, where rewards comprise $m$ hierarchically ordered objectives. In this framework, the agent the agent maximizes objectives sequentially, prioritizing the highest-priority objective before considering subsequent ones. We introduce the first MORL algorithm with provable regret guarantees. For any objective $i \in \\{1, 2, \ldots, m\\}$, our algorithm achieves a regret bound of $\widetilde{O}(\Lambda^i(\lambda) \cdot \sqrt{d^2H^4 K})$, where $\Lambda^i(\lambda) = 1 + \lambda + \cdots + \lambda^{i-1}$, $\lambda$ quantifies the trade-off between conflicting objectives, $d$ is the feature dimension, $H$ is the episode length, and $K$ is the number of episodes. Furthermore, our algorithm can be applied in the misspecified setting, where the regret bound for the $i$-th objective becomes $\widetilde{O}(\Lambda^i(\lambda)\cdot(\sqrt{d^2H^4K}+\epsilon dH^2K))$, with $\epsilon$ denoting the degree of misspecification.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xue25a, title = {Multi-objective Linear Reinforcement Learning with Lexicographic Rewards}, author = {Xue, Bo and Bu, Dake and Cheng, Ji and Wan, Yuanyu and Zhang, Qingfu}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {70065--70085}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xue25a/xue25a.pdf}, url = {https://proceedings.mlr.press/v267/xue25a.html}, abstract = {Reinforcement Learning (RL) with linear transition kernels and reward functions has recently attracted growing attention due to its computational efficiency and theoretical advancements. However, prior theoretical research in RL has primarily focused on single-objective problems, resulting in limited theoretical development for multi-objective reinforcement learning (MORL). To bridge this gap, we examine MORL under lexicographic reward structures, where rewards comprise $m$ hierarchically ordered objectives. In this framework, the agent the agent maximizes objectives sequentially, prioritizing the highest-priority objective before considering subsequent ones. We introduce the first MORL algorithm with provable regret guarantees. For any objective $i \in \\{1, 2, \ldots, m\\}$, our algorithm achieves a regret bound of $\widetilde{O}(\Lambda^i(\lambda) \cdot \sqrt{d^2H^4 K})$, where $\Lambda^i(\lambda) = 1 + \lambda + \cdots + \lambda^{i-1}$, $\lambda$ quantifies the trade-off between conflicting objectives, $d$ is the feature dimension, $H$ is the episode length, and $K$ is the number of episodes. Furthermore, our algorithm can be applied in the misspecified setting, where the regret bound for the $i$-th objective becomes $\widetilde{O}(\Lambda^i(\lambda)\cdot(\sqrt{d^2H^4K}+\epsilon dH^2K))$, with $\epsilon$ denoting the degree of misspecification.} }
Endnote
%0 Conference Paper %T Multi-objective Linear Reinforcement Learning with Lexicographic Rewards %A Bo Xue %A Dake Bu %A Ji Cheng %A Yuanyu Wan %A Qingfu Zhang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xue25a %I PMLR %P 70065--70085 %U https://proceedings.mlr.press/v267/xue25a.html %V 267 %X Reinforcement Learning (RL) with linear transition kernels and reward functions has recently attracted growing attention due to its computational efficiency and theoretical advancements. However, prior theoretical research in RL has primarily focused on single-objective problems, resulting in limited theoretical development for multi-objective reinforcement learning (MORL). To bridge this gap, we examine MORL under lexicographic reward structures, where rewards comprise $m$ hierarchically ordered objectives. In this framework, the agent the agent maximizes objectives sequentially, prioritizing the highest-priority objective before considering subsequent ones. We introduce the first MORL algorithm with provable regret guarantees. For any objective $i \in \\{1, 2, \ldots, m\\}$, our algorithm achieves a regret bound of $\widetilde{O}(\Lambda^i(\lambda) \cdot \sqrt{d^2H^4 K})$, where $\Lambda^i(\lambda) = 1 + \lambda + \cdots + \lambda^{i-1}$, $\lambda$ quantifies the trade-off between conflicting objectives, $d$ is the feature dimension, $H$ is the episode length, and $K$ is the number of episodes. Furthermore, our algorithm can be applied in the misspecified setting, where the regret bound for the $i$-th objective becomes $\widetilde{O}(\Lambda^i(\lambda)\cdot(\sqrt{d^2H^4K}+\epsilon dH^2K))$, with $\epsilon$ denoting the degree of misspecification.
APA
Xue, B., Bu, D., Cheng, J., Wan, Y. & Zhang, Q.. (2025). Multi-objective Linear Reinforcement Learning with Lexicographic Rewards. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:70065-70085 Available from https://proceedings.mlr.press/v267/xue25a.html.

Related Material