Robust Inverse Reinforcement Learning Control with Unknown States

Bosen Lian, Wenqian Xue, Nhan Nguyen
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:750-762, 2025.

Abstract

This paper designs a robust inverse reinforcement learning (IRL) algorithm that observes the expert’s inputs and outputs to reconstruct the underlying cost function weights and optimal control policy for optimal discrete-time (DT) output feedback (OPFB) control systems while admitting disturbances and unknown states. The expert system is captured by a zero-sum game where its OPFB controller minimizes a cost function while robustly mitigating the effect of the worst disturbance, achieving a prescribed attenuation level. The inputs and outputs of the expert can be observed, but not the states. To enable the learner to replicate the behavior of the expert, we first develop a model-based IRL algorithm and subsequently design an equivalent model-free, data-driven version. This latter infers the quadratic cost function weights that can yield the expert’s static OPFB control policy, using output and input data of both the expert and learner. The convergence of the proposed algorithms is rigorously validated through theoretical analysis and numerical experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-lian25a, title = {Robust Inverse Reinforcement Learning Control with Unknown States}, author = {Lian, Bosen and Xue, Wenqian and Nguyen, Nhan}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {750--762}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/lian25a/lian25a.pdf}, url = {https://proceedings.mlr.press/v283/lian25a.html}, abstract = {This paper designs a robust inverse reinforcement learning (IRL) algorithm that observes the expert’s inputs and outputs to reconstruct the underlying cost function weights and optimal control policy for optimal discrete-time (DT) output feedback (OPFB) control systems while admitting disturbances and unknown states. The expert system is captured by a zero-sum game where its OPFB controller minimizes a cost function while robustly mitigating the effect of the worst disturbance, achieving a prescribed attenuation level. The inputs and outputs of the expert can be observed, but not the states. To enable the learner to replicate the behavior of the expert, we first develop a model-based IRL algorithm and subsequently design an equivalent model-free, data-driven version. This latter infers the quadratic cost function weights that can yield the expert’s static OPFB control policy, using output and input data of both the expert and learner. The convergence of the proposed algorithms is rigorously validated through theoretical analysis and numerical experiments.} }
Endnote
%0 Conference Paper %T Robust Inverse Reinforcement Learning Control with Unknown States %A Bosen Lian %A Wenqian Xue %A Nhan Nguyen %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-lian25a %I PMLR %P 750--762 %U https://proceedings.mlr.press/v283/lian25a.html %V 283 %X This paper designs a robust inverse reinforcement learning (IRL) algorithm that observes the expert’s inputs and outputs to reconstruct the underlying cost function weights and optimal control policy for optimal discrete-time (DT) output feedback (OPFB) control systems while admitting disturbances and unknown states. The expert system is captured by a zero-sum game where its OPFB controller minimizes a cost function while robustly mitigating the effect of the worst disturbance, achieving a prescribed attenuation level. The inputs and outputs of the expert can be observed, but not the states. To enable the learner to replicate the behavior of the expert, we first develop a model-based IRL algorithm and subsequently design an equivalent model-free, data-driven version. This latter infers the quadratic cost function weights that can yield the expert’s static OPFB control policy, using output and input data of both the expert and learner. The convergence of the proposed algorithms is rigorously validated through theoretical analysis and numerical experiments.
APA
Lian, B., Xue, W. & Nguyen, N.. (2025). Robust Inverse Reinforcement Learning Control with Unknown States. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:750-762 Available from https://proceedings.mlr.press/v283/lian25a.html.

Related Material