Online Linear Quadratic Control
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:10291038, 2018.
Abstract
We study the problem of controlling linear timeinvariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steadystate distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to “strongly stable” policies that mix exponentially fast to a steady state.
Related Material


