An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Mohammad Alsalti, Victor G. Lopez, Matthias A. Müller
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:312-323, 2024.

Abstract

In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-alsalti24a, title = {An efficient data-based off-policy {Q}-learning algorithm for optimal output feedback control of linear systems}, author = {Alsalti, Mohammad and Lopez, Victor G. and M\"{u}ller, Matthias A.}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {312--323}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/alsalti24a/alsalti24a.pdf}, url = {https://proceedings.mlr.press/v242/alsalti24a.html}, abstract = {In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.} }
Endnote
%0 Conference Paper %T An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems %A Mohammad Alsalti %A Victor G. Lopez %A Matthias A. Müller %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-alsalti24a %I PMLR %P 312--323 %U https://proceedings.mlr.press/v242/alsalti24a.html %V 242 %X In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.
APA
Alsalti, M., Lopez, V.G. & Müller, M.A.. (2024). An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:312-323 Available from https://proceedings.mlr.press/v242/alsalti24a.html.

Related Material