Bounded robustness in reinforcement learning via lexicographic objectives

Daniel Jarne Ornia, Licio Romao, Lewis Hammond, Manuel Mazo Jr, Alessandro Abate
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:954-967, 2024.

Abstract

Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-jarne-ornia24a, title = {Bounded robustness in reinforcement learning via lexicographic objectives}, author = {Jarne Ornia, Daniel and Romao, Licio and Hammond, Lewis and Jr, Manuel Mazo and Abate, Alessandro}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {954--967}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/jarne-ornia24a/jarne-ornia24a.pdf}, url = {https://proceedings.mlr.press/v242/jarne-ornia24a.html}, abstract = {Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.} }
Endnote
%0 Conference Paper %T Bounded robustness in reinforcement learning via lexicographic objectives %A Daniel Jarne Ornia %A Licio Romao %A Lewis Hammond %A Manuel Mazo Jr %A Alessandro Abate %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-jarne-ornia24a %I PMLR %P 954--967 %U https://proceedings.mlr.press/v242/jarne-ornia24a.html %V 242 %X Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.
APA
Jarne Ornia, D., Romao, L., Hammond, L., Jr, M.M. & Abate, A.. (2024). Bounded robustness in reinforcement learning via lexicographic objectives. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:954-967 Available from https://proceedings.mlr.press/v242/jarne-ornia24a.html.

Related Material