Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrtT)$ Regret

Yeoneung Kim, Gihun Kim, Jiwhan Park, Insoon Yang
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:378-391, 2025.

Abstract

We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-kim25b, title = {Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret}, author = {Kim, Yeoneung and Kim, Gihun and Park, Jiwhan and Yang, Insoon}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {378--391}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/kim25b/kim25b.pdf}, url = {https://proceedings.mlr.press/v283/kim25b.html}, abstract = {We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.} }
Endnote
%0 Conference Paper %T Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrtT)$ Regret %A Yeoneung Kim %A Gihun Kim %A Jiwhan Park %A Insoon Yang %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-kim25b %I PMLR %P 378--391 %U https://proceedings.mlr.press/v283/kim25b.html %V 283 %X We propose a novel Thompson sampling algorithm that learns linear quadratic regulators (LQR) with a Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a carefully designed preconditioner and incorporates a simple excitation mechanism. We show that the excitation signal drives the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Furthermore, we establish nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without relying on the restrictive assumptions that are often used in the literature.
APA
Kim, Y., Kim, G., Park, J. & Yang, I.. (2025). Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrtT)$ Regret. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:378-391 Available from https://proceedings.mlr.press/v283/kim25b.html.

Related Material