Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods

Riccardo De Santi, Manish Prajapat, Andreas Krause
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10235-10266, 2024.

Abstract

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent approximation guarantees. We also provide hardness of approximation results and empirically demonstrate the effectiveness of our method on several GRL instances.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-de-santi24b, title = {Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods}, author = {De Santi, Riccardo and Prajapat, Manish and Krause, Andreas}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {10235--10266}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/de-santi24b/de-santi24b.pdf}, url = {https://proceedings.mlr.press/v235/de-santi24b.html}, abstract = {In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent approximation guarantees. We also provide hardness of approximation results and empirically demonstrate the effectiveness of our method on several GRL instances.} }
Endnote
%0 Conference Paper %T Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods %A Riccardo De Santi %A Manish Prajapat %A Andreas Krause %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-de-santi24b %I PMLR %P 10235--10266 %U https://proceedings.mlr.press/v235/de-santi24b.html %V 235 %X In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent approximation guarantees. We also provide hardness of approximation results and empirically demonstrate the effectiveness of our method on several GRL instances.
APA
De Santi, R., Prajapat, M. & Krause, A.. (2024). Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10235-10266 Available from https://proceedings.mlr.press/v235/de-santi24b.html.

Related Material