Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret

Asaf B Cassel, Tomer Koren
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1304-1313, 2021.

Abstract

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-cassel21a, title = {Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt{}$T Regret}, author = {Cassel, Asaf B and Koren, Tomer}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {1304--1313}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/cassel21a/cassel21a.pdf}, url = {https://proceedings.mlr.press/v139/cassel21a.html}, abstract = {We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.} }
Endnote
%0 Conference Paper %T Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret %A Asaf B Cassel %A Tomer Koren %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-cassel21a %I PMLR %P 1304--1313 %U https://proceedings.mlr.press/v139/cassel21a.html %V 139 %X We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.
APA
Cassel, A.B. & Koren, T.. (2021). Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with $\sqrt$T Regret. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:1304-1313 Available from https://proceedings.mlr.press/v139/cassel21a.html.

Related Material