Gain estimation of linear dynamical systems using Thompson Sampling
[edit]
Proceedings of Machine Learning Research, PMLR 89:15351543, 2019.
Abstract
We present the gain estimation problem for linear dynamical systems as a multiarmed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy inputoutput data is available. In a more general setup, the noise perturbing the data is nonwhite and the variance at each frequency band is unknown, resulting in a twodimensional Gaussian bandit model with unknown mean and scaledidentity covariance matrix. This model corresponds to a twoparameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problemdependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a ThompsonSampling algorithm under a uniform prior over the variances and the twodimensional means.
Related Material


