Gain estimation of linear dynamical systems using Thompson Sampling

Matias I. Müller, Cristian R. Rojas
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1535-1543, 2019.

Abstract

We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-muller19a, title = {Gain estimation of linear dynamical systems using Thompson Sampling}, author = {M\"{u}ller, Matias I. and Rojas, Cristian R.}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1535--1543}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/muller19a/muller19a.pdf}, url = {https://proceedings.mlr.press/v89/muller19a.html}, abstract = {We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.} }
Endnote
%0 Conference Paper %T Gain estimation of linear dynamical systems using Thompson Sampling %A Matias I. Müller %A Cristian R. Rojas %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-muller19a %I PMLR %P 1535--1543 %U https://proceedings.mlr.press/v89/muller19a.html %V 89 %X We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.
APA
Müller, M.I. & Rojas, C.R.. (2019). Gain estimation of linear dynamical systems using Thompson Sampling. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1535-1543 Available from https://proceedings.mlr.press/v89/muller19a.html.

Related Material