Learning the model-free linear quadratic regulator via random search

Hesameddin Mohammadi; Mihailo R. Jovanovic’; Mahdi Soltanolkotabi

Learning the model-free linear quadratic regulator via random search

Hesameddin Mohammadi, Mihailo R. Jovanovic’, Mahdi Soltanolkotabi

Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR 120:531-539, 2020.

Abstract

Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we examine the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving $\epsilon$-accuracy in a model-free setup and the total number of function evaluations are both of $O (\log \, (1/\epsilon) )$.

Cite this Paper

BibTeX


@InProceedings{pmlr-v120-mohammadi20a,
  title = 	 {Learning the model-free linear quadratic regulator via random search},
  author =       {Mohammadi, Hesameddin and Jovanovic', Mihailo R. and Soltanolkotabi, Mahdi},
  booktitle = 	 {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
  pages = 	 {531--539},
  year = 	 {2020},
  editor = 	 {Bayen, Alexandre M. and Jadbabaie, Ali and Pappas, George and Parrilo, Pablo A. and Recht, Benjamin and Tomlin, Claire and Zeilinger, Melanie},
  volume = 	 {120},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--11 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v120/mohammadi20a/mohammadi20a.pdf},
  url = 	 {https://proceedings.mlr.press/v120/mohammadi20a.html},
  abstract = 	 {Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we examine the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving $\epsilon$-accuracy in a model-free setup and the total number of function evaluations are both of $O (\log \, (1/\epsilon) )$.}
}

Endnote

%0 Conference Paper
%T Learning the model-free linear quadratic regulator via random search
%A Hesameddin Mohammadi
%A Mihailo R. Jovanovic’
%A Mahdi Soltanolkotabi
%B Proceedings of the 2nd Conference on Learning for Dynamics and Control
%C Proceedings of Machine Learning Research
%D 2020
%E Alexandre M. Bayen
%E Ali Jadbabaie
%E George Pappas
%E Pablo A. Parrilo
%E Benjamin Recht
%E Claire Tomlin
%E Melanie Zeilinger	
%F pmlr-v120-mohammadi20a
%I PMLR
%P 531--539
%U https://proceedings.mlr.press/v120/mohammadi20a.html
%V 120
%X Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems as well as the lack of exact gradient computation. In this paper, we examine the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We provide theoretical bounds on the convergence rate and sample complexity of a random search method. Our results demonstrate that the required simulation time for achieving $\epsilon$-accuracy in a model-free setup and the total number of function evaluations are both of $O (\log \, (1/\epsilon) )$.

APA


Mohammadi, H., Jovanovic’, M.R. & Soltanolkotabi, M.. (2020). Learning the model-free linear quadratic regulator via random search. Proceedings of the 2nd Conference on Learning for Dynamics and Control, in Proceedings of Machine Learning Research 120:531-539 Available from https://proceedings.mlr.press/v120/mohammadi20a.html.

Related Material

Download PDF